vLLM performance dashboard
☆43Apr 26, 2024Updated last year
Alternatives and similar repositories for dashboard
Users that are interested in dashboard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- The driver for LMCache core to run in vLLM☆63Feb 4, 2025Updated last year
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- ☆105Sep 9, 2024Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆94Sep 4, 2024Updated last year
- ☆33Feb 3, 2025Updated last year
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆21Updated this week
- ☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.☆14Jun 4, 2023Updated 2 years ago
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆14Jan 12, 2026Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- ☆17Aug 5, 2025Updated 7 months ago
- ☆21Mar 3, 2025Updated last year
- An easy-to-use package for implementing SmoothQuant for LLMs☆111Apr 7, 2025Updated 11 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆63Sep 18, 2025Updated 6 months ago
- TPU inference for vLLM, with unified JAX and PyTorch support.☆262Mar 17, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Dec 4, 2025Updated 3 months ago
- SGLang Kernel Wheel Index☆17Updated this week
- OpenAI Plugins☆77Updated this week
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆20Jul 24, 2024Updated last year
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Oct 11, 2024Updated last year
- ☆207May 5, 2025Updated 10 months ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆19Aug 3, 2025Updated 7 months ago
- HomeKit smart home control via MCP — lights, locks, thermostats, and scenes for Claude Desktop, Claude Code, and OpenClaw☆64Mar 17, 2026Updated last week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆144Dec 4, 2024Updated last year
- ☆43Nov 1, 2022Updated 3 years ago
- ☆22Apr 17, 2025Updated 11 months ago
- Cloud Native Benchmarking of Foundation Models☆45Jul 31, 2025Updated 7 months ago
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆25Mar 5, 2026Updated 2 weeks ago
- ☆87Oct 17, 2025Updated 5 months ago
- torchvision-based transforms that provide access to parameterization☆16Dec 4, 2025Updated 3 months ago
- Find, list, and inspect processes from Go (golang).☆10Feb 4, 2018Updated 8 years ago
- AloePlayer: a cross-platform local media player.☆17Jan 24, 2026Updated 2 months ago