huggingface / optimum-benchmarkLinks
ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
β301Updated last week
Alternatives and similar repositories for optimum-benchmark
Users that are interested in optimum-benchmark are comparing it to the libraries listed below
Sorting:
- Easy and Efficient Quantization for Transformersβ198Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ263Updated 7 months ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β831Updated 8 months ago
- β193Updated 3 weeks ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization