vLLM performance dashboard
☆44Apr 26, 2024Updated last year
Alternatives and similar repositories for dashboard
Users that are interested in dashboard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- The driver for LMCache core to run in vLLM☆64Feb 4, 2025Updated last year
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- ☆105Sep 9, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Boosting 4-bit inference kernels with 2:4 Sparsity☆94Sep 4, 2024Updated last year
- ☆33Feb 3, 2025Updated last year
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- A practical way of learning Swizzle☆37Feb 3, 2025Updated last year
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 8 months ago
- ☆65Apr 26, 2025Updated 11 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆22Updated this week
- ☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.☆14Jun 4, 2023Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- ☆21Mar 3, 2025Updated last year
- An easy-to-use package for implementing SmoothQuant for LLMs☆111Apr 7, 2025Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inference☆63Sep 18, 2025Updated 6 months ago
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆15Jan 12, 2026Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Dec 4, 2025Updated 4 months ago
- TPU inference for vLLM, with unified JAX and PyTorch support.☆287Updated this week
- SGLang Kernel Wheel Index☆20Updated this week
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆20Jul 24, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Oct 11, 2024Updated last year
- ☆209May 5, 2025Updated 11 months ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 8 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆145Dec 4, 2024Updated last year
- ☆43Nov 1, 2022Updated 3 years ago
- Cloud Native Benchmarking of Foundation Models☆45Jul 31, 2025Updated 8 months ago
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆25Mar 5, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- study of cutlass☆22Nov 10, 2024Updated last year
- ☆87Oct 17, 2025Updated 5 months ago
- Find, list, and inspect processes from Go (golang).☆10Feb 4, 2018Updated 8 years ago
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆31Mar 28, 2025Updated last year
- torchvision-based transforms that provide access to parameterization☆16Dec 4, 2025Updated 4 months ago
- AloePlayer: a cross-platform local media player.☆17Jan 24, 2026Updated 2 months ago
- ☆63Feb 15, 2026Updated 2 months ago