fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆51Updated this week
Related projects: ⓘ
- experiments with inference on llama☆106Updated 3 months ago
- ☆42Updated this week
- ☆27Updated last month
- ☆32Updated this week
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆87Updated last year
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆55Updated this week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆150Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆52Updated this week
- ☆75Updated 3 weeks ago
- ☆61Updated 2 weeks ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆47Updated 2 weeks ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆104Updated 3 months ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆81Updated last year
- Experiments on speculative sampling with Llama models☆114Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆217Updated 2 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆68Updated 2 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆73Updated last month
- ☆145Updated last month
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆36Updated 8 months ago
- A pipeline for LLM knowledge distillation☆68Updated last month
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆86Updated 3 months ago
- ReLM is a Regular Expression engine for Language Models☆100Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆250Updated this week
- ☆201Updated 7 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆95Updated this week
- LLM Serving Performance Evaluation Harness☆45Updated 3 weeks ago
- ☆170Updated this week
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆107Updated last year
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆27Updated last year
- A collection of all available inference solutions for the LLMs☆65Updated 2 weeks ago