llm-semantic-router/vllm-router

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/llm-semantic-router/vllm-router)

llm-semantic-router / vllm-router

vLLM Router

☆56

Alternatives and similar repositories for vllm-router

Users that are interested in vllm-router are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

penkow / llama-lambda
View on GitHub
Deploying LLama 2 as AWS Lambda function for scalable serverless inference
☆22Nov 1, 2023Updated 2 years ago
Bruce-Lee-LY / cuda_auto_tune
View on GitHub
NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels.
☆23Apr 10, 2026Updated 3 months ago
mlcommons / training_results_v3.0
View on GitHub
This repository contains the results and code for the MLPerf™ Training v3.0 benchmark.
☆12Aug 10, 2023Updated 2 years ago
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆242Jan 20, 2026Updated 6 months ago
microsoft / chunk-attention
View on GitHub
☆89Apr 18, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Faraz9877 / H100_GEMM
View on GitHub
High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…
☆11Dec 4, 2024Updated last year
JiangLiSJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
bytedance-iaas / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆14Updated this week
0xWelt / VibeRL
View on GitHub
VibeRL is a Reinforcement Learning framework built essentially through vibe coding with Kimi K2.
☆17Updated this week
shell-nlp / openai_router
View on GitHub
OpenAI Router 轻量级、持久化、零配置的 OpenAI API 统一网关
☆25Jul 19, 2026Updated last week
MegEngine / mgeconvert
View on GitHub
MegEngine到其他框架的转换器
☆71Apr 27, 2023Updated 3 years ago
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Jul 17, 2026Updated last week
BDAI-Research / DFLOP
View on GitHub
☆17Apr 16, 2026Updated 3 months ago
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
FeiGeChuanShu / trt2023
View on GitHub
NVIDIA TensorRT Hackathon 2023复赛选题：通义千问Qwen-7B用TensorRT-LLM模型搭建及优化
☆43Oct 20, 2023Updated 2 years ago
Bruce-Lee-LY / cuda_hgemv
View on GitHub
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆75Sep 8, 2024Updated last year
gxinlong / cuda-optimization-skill
View on GitHub
A skill for automatically optimizing CUDA code.
☆42Mar 26, 2026Updated 3 months ago
RitaRamo / extra
View on GitHub
Retrieval-augmented Image Captioning
☆13Feb 16, 2023Updated 3 years ago
LighT-chenml / GPHash
View on GitHub
☆14Dec 20, 2024Updated last year
quantumish / shacuda
View on GitHub
Fast SHA-256 that utilizes the GPU
☆13Dec 17, 2021Updated 4 years ago
LoongServe / LoongServe
View on GitHub
☆135Nov 11, 2024Updated last year
cirquit / presto
View on GitHub
☆15Jan 21, 2023Updated 3 years ago
MegEngine / Documentation
View on GitHub
MegEngine Official Documentation
☆38Dec 4, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
NascentCore / 3k
View on GitHub
Orchestrating many small GPU clusters for running serverless GPU workloads
☆18Mar 15, 2026Updated 4 months ago
AgentMaker / PaddleQuickInference
View on GitHub
一个基于 Paddle Inference 封装的用于快速部署的高层 API
☆33Nov 13, 2021Updated 4 years ago
ajtejankar / mixtral-vis-moe
View on GitHub
Visualize expert firing frequencies across sentences in the Mixtral MoE model
☆18Dec 22, 2023Updated 2 years ago
vllm-project / router
View on GitHub
A high-performance and light-weight router for vLLM large scale deployment
☆328Updated this week
ljmict / feiQ
View on GitHub
Go语言实现命令行版飞秋
☆13Nov 26, 2018Updated 7 years ago
ssbuild / aigc_evals
View on GitHub
aigc evals
☆10Dec 2, 2023Updated 2 years ago
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
dustinvtran / blog
View on GitHub
All code and content for my blog.
☆15Sep 23, 2018Updated 7 years ago
winter1203 / vllm_GOT2_OCR
View on GitHub
Accelerating GOT-OCRv2 with VLLM
☆10Nov 15, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
IlyasMoutawwakil / py-txi
View on GitHub
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆32Sep 19, 2025Updated 10 months ago
DeepLink-org / DLBlas
View on GitHub
DLBlas: clean and efficient kernels
☆44Updated this week
shyyhs / CourseraParallelCorpusMining
View on GitHub
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation
☆15Aug 27, 2024Updated last year
raywan-110 / AdaQP
View on GitHub
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
☆24Mar 1, 2024Updated 2 years ago
YJHMITWEB / ExFlow
View on GitHub
Explore Inter-layer Expert Affinity in MoE Model Inference
☆16May 6, 2024Updated 2 years ago
aplmikex / deduplication_mnbvc
View on GitHub
文本去重
☆77May 23, 2024Updated 2 years ago
instavm / skill-optimization
View on GitHub
Demonstration of DSPy optimization for Skill.md files
☆15Dec 28, 2025Updated 6 months ago