A high-performance and light-weight router for vLLM large scale deployment
☆160Mar 20, 2026Updated this week
Alternatives and similar repositories for router
Users that are interested in router are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 2 months ago
- Cute layout visualization☆33Jan 18, 2026Updated 2 months ago
- Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…☆40Mar 20, 2026Updated last week
- NVIDIA Inference Xfer Library (NIXL)☆945Mar 20, 2026Updated last week
- ☆28Apr 17, 2025Updated 11 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- KV cache store for distributed LLM inference☆400Nov 13, 2025Updated 4 months ago
- A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…☆103Mar 19, 2026Updated last week
- MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation☆46Updated this week
- ☆28Mar 17, 2024Updated 2 years ago
- A better wrapper for using RDMA programming APIs in Rust flavor☆80Mar 15, 2026Updated last week
- ☆12Mar 18, 2024Updated 2 years ago
- ☆20Jun 9, 2025Updated 9 months ago
- ☆94May 31, 2025Updated 9 months ago
- Expert Specialization MoE Solution based on CUTLASS☆27Jan 19, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- VUT FIT Video server extension☆10Jan 27, 2023Updated 3 years ago
- Efficient and easy multi-instance LLM serving☆536Mar 12, 2026Updated 2 weeks ago
- Building LaTeX packages using Travis-CI☆12Dec 21, 2019Updated 6 years ago
- Lock-free buddy allocator based on binary heap☆13Mar 3, 2025Updated last year
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,953Updated this week
- Kubernetes CSI Driver for serving OCI model artifacts☆24Updated this week
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Supercharge Your LLM with the Fastest KV Cache Layer☆7,745Updated this week
- ☆15Jan 24, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Perplexity GPU Kernels☆564Nov 7, 2025Updated 4 months ago
- Notes and work-in-progress for BPF-related research projects☆12Jan 10, 2025Updated last year
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Nov 1, 2021Updated 4 years ago
- A parallelism VAE avoids OOM for high resolution image generation☆89Mar 12, 2026Updated 2 weeks ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆172Feb 11, 2026Updated last month
- ☆17Mar 4, 2026Updated 3 weeks ago
- ☆11Feb 13, 2025Updated last year
- NVIDIA cuTile learn☆165Dec 9, 2025Updated 3 months ago
- Collaborative Filtering NN and CNN based recommender implemented with MXNet☆12Apr 14, 2018Updated 7 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [NSDI '24] DINT: Fast In-Kernel Distributed Transactions with eBPF☆53Jul 6, 2024Updated last year
- ☆37Jan 10, 2026Updated 2 months ago
- ☆10Apr 20, 2025Updated 11 months ago
- ☆15May 20, 2022Updated 3 years ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,273Aug 28, 2025Updated 6 months ago
- A PyTorch-Based GPU Parallel Env for IPPS Problem, supporting DRL, IL and Learning Guided MCTS.☆17Oct 4, 2025Updated 5 months ago
- High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…☆27Mar 20, 2026Updated last week