huggingface / optimum-benchmarkLinks
ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
β325Updated 3 months ago
Alternatives and similar repositories for optimum-benchmark
Users that are interested in optimum-benchmark are comparing it to the libraries listed below
Sorting:
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β204Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ268Updated last month
- Easy and Efficient Quantization for Transformersβ202Updated 6 months ago
- β322Updated this week
- Comparison of Language Model Inference Enginesβ238Updated last year
- An innovative library for efficient LLM inference via low-bit quantizationβ351Updated last year
- β206Updated 8 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)β905Updated 3 weeks ago
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ527Updated this week
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"β393Updated last year
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β978Updated last year
- For releasing code related to compression methods for transformers, accompanying our publicationsβ454Updated 11 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ228Updated this week
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantizationβ711Updated last year
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β218Updated this week
- Benchmark suite for LLMs from Fireworks.aiβ84Updated last month
- OpenAI compatible API for TensorRT LLM triton backendβ218Updated last year