huggingface / inference-benchmarkerLinks

Inference server benchmarking tool

☆130

Alternatives and similar repositories for inference-benchmarker

Users that are interested in inference-benchmarker are comparing it to the libraries listed below

Sorting:

neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
run-ai / runai-model-streamer
☆267Updated last week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆236Updated 11 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆327Updated this week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆140Updated this week
huggingface / kernel-builder
👷 Build compute kernels
☆190Updated this week
vllm-project / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆730Updated this week
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated last week
triton-inference-server / vllm_backend
☆317Updated last week
huggingface / kernels
Load compute kernels from the Hub
☆337Updated last week
apple / ml-recurrent-drafter
☆219Updated 10 months ago
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆350Updated last year
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆218Updated last year
snowflakedb / ArcticTraining
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
☆257Updated this week
huggingface / gpu-fryer
Where GPUs get cooked 👩‍🍳🔥
☆319Updated 2 months ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆203Updated 5 months ago
intel / auto-round
Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with …
☆735Updated this week
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆392Updated 5 months ago
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆262Updated last week
unslothai / unsloth-zoo
Utils for Unsloth https://github.com/unslothai/unsloth
☆177Updated this week
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆214Updated this week
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆139Updated last year
vllm-project / recipes
Common recipes to run vLLM
☆256Updated this week
vllm-project / dashboard
vLLM performance dashboard
☆38Updated last year
NVIDIA / kvpress
LLM KV cache compression made easy
☆701Updated this week
neuralmagic / AutoFP8
☆205Updated 7 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆93Updated last week
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆201Updated this week
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆234Updated last week