huggingface / optimum-benchmarkLinks

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

☆318

Alternatives and similar repositories for optimum-benchmark

Users that are interested in optimum-benchmark are comparing it to the libraries listed below

Sorting:

NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆202Updated 3 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆200Updated last week
neuralmagic / AutoFP8
☆205Updated 5 months ago
triton-inference-server / vllm_backend
☆302Updated this week
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated last year
IST-DASLab / marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆916Updated last year
microsoft / TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
☆446Updated 9 months ago
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆231Updated 10 months ago
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆704Updated last year
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆385Updated last year
fpgaminer / GPTQ-triton
GPTQ inference Triton kernel
☆310Updated 2 years ago
mobiusml / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆883Updated last month
SqueezeAILab / KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
☆389Updated last year
Vahe1994 / SpQR
☆546Updated 10 months ago
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆215Updated last year
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆215Updated this week
run-ai / llmperf
☆58Updated last year
huggingface / inference-benchmarker
Inference server benchmarking tool
☆118Updated 3 weeks ago
yuhuixu1993 / qa-lora
Official PyTorch implementation of QA-LoRA
☆141Updated last year
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆283Updated this week
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆180Updated 6 months ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆82Updated last week
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆299Updated 2 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆90Updated this week
intel / auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.
☆668Updated this week
Cornell-RelaxML / quip-sharp
☆559Updated 11 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆171Updated last week