A high-performance and light-weight router for vLLM large scale deployment
☆131Mar 1, 2026Updated this week
Alternatives and similar repositories for router
Users that are interested in router are comparing it to the libraries listed below
Sorting:
- 15-721 Spring 2024 - Cache #1☆12May 2, 2024Updated last year
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated last month
- KV cache store for distributed LLM inference☆396Nov 13, 2025Updated 3 months ago
- ☆20Jun 9, 2025Updated 8 months ago
- ☆27Apr 17, 2025Updated 10 months ago
- A light weight vLLM simulator, for mocking out replicas.☆87Updated this week
- 基于昇腾310芯片的大语言模型部署☆24Jun 14, 2024Updated last year
- MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation☆43Feb 8, 2026Updated 3 weeks ago
- ☆88May 31, 2025Updated 9 months ago
- A user-friendly Command & Control (C&C) web platform for remote monitoring, management, and task automation across multiple devices.☆14Dec 15, 2024Updated last year
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆162Updated this week
- NVIDIA Inference Xfer Library (NIXL)☆898Feb 28, 2026Updated last week
- mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations☆70Jan 12, 2026Updated last month
- NVIDIA cuTile learn☆165Dec 9, 2025Updated 2 months ago
- python library☆12Nov 25, 2025Updated 3 months ago
- Repository for go shared libraries (for now).☆11Dec 1, 2025Updated 3 months ago
- ☆10Dec 30, 2020Updated 5 years ago
- Perplexity GPU Kernels☆567Nov 7, 2025Updated 4 months ago
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆63Dec 19, 2025Updated 2 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆275Updated this week
- ☆28Mar 17, 2024Updated last year
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- Supercharge Your LLM with the Fastest KV Cache Layer☆7,272Updated this week
- Efficient and easy multi-instance LLM serving☆528Sep 3, 2025Updated 6 months ago
- Kubernetes-native AI serving platform for scalable model serving.☆233Feb 28, 2026Updated last week
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆165Feb 11, 2026Updated 3 weeks ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆716Feb 28, 2026Updated last week
- ☆14Jun 10, 2025Updated 8 months ago
- golang package to provide lightweight internal pub/sub for goroutines☆29Jan 23, 2014Updated 12 years ago
- Community maintained hardware plugin for vLLM on AWS Neuron☆23Feb 26, 2026Updated last week
- ☆15Aug 7, 2025Updated 7 months ago
- Training and testing pipeline for ransomware classification based on screenshots of the splash screens or ransom notes (https://arxiv.org…☆11Jul 19, 2020Updated 5 years ago
- A web interface for SleekDB written in PHP☆11Jan 22, 2022Updated 4 years ago
- ☆20May 24, 2025Updated 9 months ago
- GeminiFS: A Companion File System for GPUs☆71Feb 18, 2025Updated last year
- Health checks for Azure N- and H-series VMs.☆57Feb 5, 2026Updated last month
- Argon2 key derivation for Ruby☆11Feb 19, 2026Updated 2 weeks ago
- ☆18Jun 18, 2025Updated 8 months ago
- Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)☆11Jul 19, 2024Updated last year