ray-project / llmperf
LLMPerf is a library for validating and benchmarking LLMs
☆677Updated last week
Alternatives and similar repositories for llmperf:
Users that are interested in llmperf are comparing it to the libraries listed below
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆790Updated this week
- Serving multiple LoRA finetuned LLM as one☆1,002Updated 7 months ago
- The Triton TensorRT-LLM Backend☆724Updated this week
- [NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces in…☆835Updated this week
- A throughput-oriented high-performance serving framework for LLMs☆654Updated 2 months ago
- ☆434Updated 11 months ago
- ☆201Updated this week
- RayLLM - LLMs on Ray☆1,241Updated 6 months ago
- Comparison of Language Model Inference Engines☆192Updated 3 months ago
- Efficient, Flexible and Portable Structured Generation☆471Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆178Updated 4 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,765Updated 10 months ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆655Updated 3 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆274Updated this week
- FlashInfer: Kernel Library for LLM Serving☆1,552Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆1,826Updated this week
- Scalable data pre processing and curation toolkit for LLMs☆668Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆877Updated this week
- Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)☆854Updated 3 weeks ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,254Updated this week
- Scalable toolkit for efficient model alignment☆645Updated this week
- Making Long-Context LLM Inference 10x Faster and 10x Cheaper☆286Updated this week
- Minimalistic large language model 3D-parallelism training☆1,331Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,755Updated this week
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,161Updated 2 months ago
- ☆479Updated 3 months ago
- ☆49Updated 3 months ago
- A high-performance inference system for large language models, designed for production environments.☆397Updated 3 weeks ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆1,923Updated 3 weeks ago