A high-performance and light-weight router for vLLM large scale deployment
☆233May 6, 2026Updated 3 weeks ago
Alternatives and similar repositories for router
Users that are interested in router are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 4 months ago
- NVIDIA Inference Xfer Library (NIXL)☆1,041May 22, 2026Updated last week
- Cute layout visualization☆39Jan 18, 2026Updated 4 months ago
- ☆16Jul 9, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆29Apr 17, 2025Updated last year
- Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…☆64May 22, 2026Updated last week
- KV cache store for distributed LLM inference☆419Nov 13, 2025Updated 6 months ago
- A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…☆133May 20, 2026Updated last week
- Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Indus…☆273May 21, 2026Updated last week
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- ☆28Mar 17, 2024Updated 2 years ago
- A better wrapper for using RDMA programming APIs in Rust flavor☆84Updated this week
- MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation☆53Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆212Updated this week
- ☆12Mar 18, 2024Updated 2 years ago
- ☆21Jun 9, 2025Updated 11 months ago
- ☆17Aug 22, 2021Updated 4 years ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆62Nov 8, 2024Updated last year
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆5,401Updated this week
- ☆105May 31, 2025Updated 11 months ago
- ☆88Updated this week
- 一步步通关GPU编程☆44May 15, 2026Updated 2 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- llm-d helm charts and deployment examples☆57May 1, 2026Updated 3 weeks ago
- Supercharge Your LLM with the Fastest KV Cache Layer☆8,345Updated this week
- A tool to make spelling Thai more convenient☆11Mar 30, 2024Updated 2 years ago
- Efficient and easy multi-instance LLM serving☆551Mar 12, 2026Updated 2 months ago
- Lock-free buddy allocator based on binary heap☆14Mar 3, 2025Updated last year
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆453Updated this week
- ☆365Jan 28, 2026Updated 4 months ago
- Keyaki Treebank Parsed Corpus☆10May 15, 2019Updated 7 years ago
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆320May 21, 2026Updated last week
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Kubernetes CSI Driver for serving OCI model artifacts☆27Updated this week
- Perplexity GPU Kernels☆576Nov 7, 2025Updated 6 months ago
- Open source code of BGL NSDI 2023☆17Apr 27, 2026Updated last month
- A userspace filesystem backing by Apache OpenDAL.☆36Jan 8, 2026Updated 4 months ago
- A parallelism VAE avoids OOM for high resolution image generation☆93May 8, 2026Updated 3 weeks ago
- ☆19Feb 14, 2023Updated 3 years ago
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Nov 1, 2021Updated 4 years ago