run-ai / llmperf
☆53Updated 7 months ago
Alternatives and similar repositories for llmperf:
Users that are interested in llmperf are comparing it to the libraries listed below
- ☆50Updated 5 months ago
- Benchmark suite for LLMs from Fireworks.ai☆70Updated 2 months ago
- ☆117Updated last year
- LLM Serving Performance Evaluation Harness☆77Updated 2 months ago
- The driver for LMCache core to run in vLLM☆38Updated 3 months ago
- ☆186Updated 7 months ago
- A low-latency & high-throughput serving engine for LLMs☆351Updated 2 weeks ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆126Updated this week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆73Updated 8 months ago
- ☆84Updated last month
- Perplexity GPU Kernels☆272Updated this week
- Stateful LLM Serving☆65Updated last month
- Modular and structured prompt caching for low-latency LLM inference☆92Updated 5 months ago
- Fast and memory-efficient exact attention☆68Updated last week
- A minimal implementation of vllm.☆40Updated 9 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆115Updated 5 months ago
- Applied AI experiments and examples for PyTorch☆262Updated last week
- ☆118Updated last year
- Comparison of Language Model Inference Engines☆215Updated 4 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆99Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆307Updated 10 months ago
- ☆250Updated last week
- ☆45Updated 10 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆296Updated 2 weeks ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆194Updated this week
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆118Updated last year
- Easy and Efficient Quantization for Transformers☆197Updated 2 months ago
- Cataloging released Triton kernels.☆220Updated 3 months ago
- ☆59Updated 10 months ago
- Efficient and easy multi-instance LLM serving☆398Updated this week