A high-performance and light-weight router for vLLM large scale deployment
☆214Apr 30, 2026Updated last week
Alternatives and similar repositories for router
Users that are interested in router are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 3 months ago
- NVIDIA Inference Xfer Library (NIXL)☆1,011Apr 30, 2026Updated last week
- Cute layout visualization☆38Jan 18, 2026Updated 3 months ago
- 15-721 Spring 2024 - Cache #1☆12May 2, 2024Updated 2 years ago
- ☆16Jul 9, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆28Apr 17, 2025Updated last year
- Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…☆56Updated this week
- KV cache store for distributed LLM inference☆416Nov 13, 2025Updated 5 months ago
- Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Indus…☆212Updated this week
- A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…☆123Updated this week
- ☆28Mar 17, 2024Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation☆52Mar 25, 2026Updated last month
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆201Updated this week
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆21Jun 9, 2025Updated 10 months ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆62Nov 8, 2024Updated last year
- ☆15Feb 23, 2025Updated last year
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆5,242Updated this week
- ☆99May 31, 2025Updated 11 months ago
- Expert Specialization MoE Solution based on CUTLASS☆26Apr 14, 2026Updated 3 weeks ago
- llm-d helm charts and deployment examples☆55Updated this week
- Supercharge Your LLM with the Fastest KV Cache Layer☆8,187Updated this week
- We apply attention on LSTM to give more weight to the items which are more relevant for next-item prediction☆11Dec 26, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Efficient and easy multi-instance LLM serving☆547Mar 12, 2026Updated last month
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆253Apr 30, 2026Updated last week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆391Updated this week
- Building LaTeX packages using Travis-CI☆12Dec 21, 2019Updated 6 years ago
- Kubernetes CSI Driver for serving OCI model artifacts☆25Apr 29, 2026Updated last week
- ☆15Jan 24, 2022Updated 4 years ago
- Perplexity GPU Kernels☆570Nov 7, 2025Updated 6 months ago
- Notes and work-in-progress for BPF-related research projects☆12Jan 10, 2025Updated last year
- A userspace filesystem backing by Apache OpenDAL.☆36Jan 8, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A parallelism VAE avoids OOM for high resolution image generation☆91Apr 21, 2026Updated 2 weeks ago
- ☆18Apr 22, 2026Updated 2 weeks ago
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Nov 1, 2021Updated 4 years ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆181Feb 11, 2026Updated 2 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆1,086Apr 29, 2026Updated last week
- ☆11Feb 13, 2025Updated last year
- Collaborative Filtering NN and CNN based recommender implemented with MXNet☆12Apr 14, 2018Updated 8 years ago