vllm-project / dashboardLinks

vLLM performance dashboard

☆33

Alternatives and similar repositories for dashboard

Users that are interested in dashboard are comparing it to the libraries listed below

Sorting:

IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆134Updated 6 months ago
neuralmagic / AutoFP8
☆195Updated 3 months ago
InternLM / turbomind
☆92Updated 4 months ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆123Updated 8 months ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆80Updated 11 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆203Updated this week
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆142Updated this week
deepspeedai / DeepSpeed-Kernels
☆74Updated 4 months ago
flashinfer-ai / cutlass-viz
☆60Updated 3 months ago
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆79Updated 5 months ago
AniZpZ / AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
☆103Updated 3 months ago
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
efeslab / fiddler
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆224Updated 8 months ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆198Updated last month
HandH1998 / QQQ
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆137Updated 3 months ago
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆165Updated last year
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆175Updated 4 months ago
casper-hansen / AutoAWQ_kernels
☆76Updated 8 months ago
LLM-inference-router / vllm-router
vLLM Router
☆39Updated last year
vllm-project / recipes
Common recipes to run vLLM
☆68Updated last week
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆106Updated 2 months ago
microsoft / chunk-attention
☆78Updated 3 months ago
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆169Updated this week
IST-DASLab / MoE-Quant
Code for data-aware compression of DeepSeek models
☆42Updated last month
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆42Updated last month
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆260Updated 3 weeks ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆180Updated this week
ROCm / TransformerEngine
☆41Updated this week
feifeibear / DPSKV3MFU
Estimate MFU for DeepSeekV3
☆25Updated 7 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated 9 months ago