llm-semantic-router / vllm-routerView external linksLinks
vLLM Router
☆55Mar 11, 2024Updated last year
Alternatives and similar repositories for vllm-router
Users that are interested in vllm-router are comparing it to the libraries listed below
Sorting:
- Whisper in TensorRT-LLM☆17Sep 21, 2023Updated 2 years ago
- ☆13Jan 7, 2025Updated last year
- [ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation☆22May 29, 2025Updated 8 months ago
- ☆27Jan 7, 2025Updated last year
- 一个基于 Paddle Inference 封装的用于快速部署的高层 API☆33Nov 13, 2021Updated 4 years ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM 模型搭建及优化☆43Oct 20, 2023Updated 2 years ago
- ☆141Apr 23, 2024Updated last year
- ☆21Mar 22, 2021Updated 4 years ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Dec 25, 2025Updated last month
- DeepSparkInference has selected 216 inference models of both small and large sizes. The small models cover fields such as computer vision…☆27Updated this week
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Jul 15, 2025Updated 7 months ago
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆101Dec 15, 2025Updated 2 months ago
- ☆131Nov 11, 2024Updated last year
- ☆34Feb 3, 2025Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 4 months ago
- GPU operators for sparse tensor operations☆35Mar 11, 2024Updated last year
- 文本去重☆78May 23, 2024Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- LLMs as Collaboratively Edited Knowledge Bases☆46Feb 8, 2026Updated last week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆461May 30, 2025Updated 8 months ago
- 🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.☆251Updated this week
- Kinematic and dynamic models of continuum and articulated soft robots.☆15Nov 22, 2025Updated 2 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Jun 20, 2024Updated last year
- This project showcases engaging interactions between two AI chatbots.☆10Jan 10, 2024Updated 2 years ago
- ☆10Jan 23, 2025Updated last year
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 4 months ago
- ☆32Sep 27, 2012Updated 13 years ago
- A Simple, Explainable Vision Language Model for detecting manifacturing defects into products☆14Sep 23, 2025Updated 4 months ago
- This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerate…☆12Dec 31, 2024Updated last year
- Protocol buffers and other common resources.☆13Jan 20, 2026Updated 3 weeks ago
- Stateful LLM Serving☆95Mar 11, 2025Updated 11 months ago
- Efficient LLM Inference over Long Sequences☆392Jun 25, 2025Updated 7 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Feb 29, 2024Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Sep 13, 2025Updated 5 months ago
- A lightweight design for computation-communication overlap.☆221Jan 20, 2026Updated 3 weeks ago
- An easy-to-use package for implementing SmoothQuant for LLMs☆110Apr 7, 2025Updated 10 months ago
- Just some simple css & html that provides a clean looking bracket.☆14Mar 19, 2019Updated 6 years ago
- AI 应用服务平台☆28Nov 12, 2025Updated 3 months ago
- An example of how to use `camera_ros` with Raspberry Pi Cameras modules inside an arm64v8/ros:jazzy docker container, running on top of R…☆13Feb 20, 2025Updated 11 months ago