A high-performance and light-weight router for vLLM large scale deployment
☆268May 6, 2026Updated last month
Alternatives and similar repositories for router
Users that are interested in router are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- NVIDIA Inference Xfer Library (NIXL)☆1,079Updated this week
- 15-721 Spring 2024 - Cache #1☆12May 2, 2024Updated 2 years ago
- Cute layout visualization☆40Jan 18, 2026Updated 5 months ago
- ☆16Jul 9, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆30Apr 17, 2025Updated last year
- KV cache store for distributed LLM inference☆421Nov 13, 2025Updated 7 months ago
- Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…☆75Updated this week
- Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini &…☆328Jun 12, 2026Updated last week
- A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…☆149Updated this week
- A better wrapper for using RDMA programming APIs in Rust flavor☆92Updated this week
- MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation☆57Jun 11, 2026Updated last week
- ☆21Jun 9, 2025Updated last year
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆62Nov 8, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆14Feb 23, 2025Updated last year
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆5,569Updated this week
- ☆108May 31, 2025Updated last year
- Expert Specialization MoE Solution based on CUTLASS☆27Apr 14, 2026Updated 2 months ago
- 一步步通关GPU编程☆47Jun 4, 2026Updated 2 weeks ago
- llm-d helm charts and deployment examples☆58May 1, 2026Updated last month
- Efficient and easy multi-instance LLM serving☆553Mar 12, 2026Updated 3 months ago
- Building LaTeX packages using Travis-CI☆12Dec 21, 2019Updated 6 years ago
- Lock-free buddy allocator based on binary heap☆14Mar 3, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆515Updated this week
- ☆367Jan 28, 2026Updated 4 months ago
- Keyaki Treebank Parsed Corpus☆10May 15, 2019Updated 7 years ago
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆368Updated this week
- Kubernetes CSI Driver for serving OCI model artifacts☆27May 25, 2026Updated 3 weeks ago
- ☆15Jan 24, 2022Updated 4 years ago
- Perplexity GPU Kernels☆586Nov 7, 2025Updated 7 months ago
- Notes and work-in-progress for BPF-related research projects☆12Jan 10, 2025Updated last year
- Open source code of BGL NSDI 2023☆17Apr 27, 2026Updated last month
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A userspace filesystem backing by Apache OpenDAL.☆38Jun 2, 2026Updated 2 weeks ago
- A parallelism VAE avoids OOM for high resolution image generation☆94May 8, 2026Updated last month
- ☆18May 6, 2026Updated last month
- ☆19Feb 14, 2023Updated 3 years ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆1,060Mar 24, 2026Updated 2 months ago
- ☆11Feb 13, 2025Updated last year
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆1,254Updated this week