vLLM performance dashboard
☆42Apr 26, 2024Updated last year
Alternatives and similar repositories for dashboard
Users that are interested in dashboard are comparing it to the libraries listed below
Sorting:
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- The driver for LMCache core to run in vLLM☆61Feb 4, 2025Updated last year
- A practical way of learning Swizzle☆37Feb 3, 2025Updated last year
- ☆104Sep 9, 2024Updated last year
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- Kernel Library Wheel for SGLang☆16Updated this week
- torchvision-based transforms that provide access to parameterization☆16Dec 4, 2025Updated 2 months ago
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- ☆34Feb 3, 2025Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆93Sep 4, 2024Updated last year
- TPU inference for vLLM, with unified JAX and PyTorch support.☆243Updated this week
- ☆65Apr 26, 2025Updated 10 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆63Sep 18, 2025Updated 5 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Feb 24, 2026Updated last week
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 7 months ago
- ☆87Oct 17, 2025Updated 4 months ago
- ☆43Aug 5, 2025Updated 6 months ago
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆20Jul 24, 2024Updated last year
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆144Dec 4, 2024Updated last year
- study of cutlass☆22Nov 10, 2024Updated last year
- Large language models designed for formal theorem proving through tool-integrated reasoning.☆33Aug 13, 2025Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆25Nov 7, 2025Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Dec 4, 2025Updated 2 months ago
- ☆52May 19, 2025Updated 9 months ago
- ☆70Updated this week
- ☆206May 5, 2025Updated 9 months ago
- ☆21Mar 3, 2025Updated last year
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆271Feb 20, 2026Updated last week
- ☆21Apr 17, 2025Updated 10 months ago
- ☆53Aug 5, 2025Updated 6 months ago
- An easy-to-use package for implementing SmoothQuant for LLMs☆111Apr 7, 2025Updated 10 months ago
- ☆26Oct 2, 2023Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆41Feb 2, 2026Updated last month
- ☆62Feb 15, 2026Updated 2 weeks ago