A high-throughput and memory-efficient inference and serving engine for LLMs
☆41Jan 26, 2025Updated last year
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16May 16, 2025Updated 11 months ago
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆15Jan 12, 2026Updated 3 months ago
- ☆50Nov 3, 2025Updated 6 months ago
- ☆14Jul 5, 2024Updated last year
- ☆45Jun 19, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆16Feb 4, 2025Updated last year
- TAP parser for .NET☆26Sep 19, 2019Updated 6 years ago
- my personal mcp server☆13Apr 23, 2025Updated last year
- 2022 秋季学期清华大学电子系数据与算法课程 OJ 参考解答☆10Jun 18, 2023Updated 2 years ago
- 主题:计算认知科学(Computational Cognitive Science)。此仓库诞生背景为IA003结业BP,仍处于萌芽期,内容设置有待转正。下一次大规模更新估计在三四年之后。☆17May 22, 2019Updated 6 years ago
- Chef cookbooks for managing a Ceph cluster☆12Apr 2, 2023Updated 3 years ago
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆39Oct 8, 2025Updated 6 months ago
- Code and data for the paper "Steering Conversational Large Language Models for Long Emotional Support Conversations" along with a UI to v…☆15Apr 14, 2025Updated last year
- my dockerfiles☆13Mar 29, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆22Feb 16, 2025Updated last year
- Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.☆21Apr 3, 2025Updated last year
- MeloTTS demo on Axera☆12Nov 18, 2025Updated 5 months ago
- Reproduced the DFT method without using Verl. https://arxiv.org/abs/2508.05629☆23Oct 14, 2025Updated 6 months ago
- NSCSCC 2020 - Yet Another MIPS Processor☆14Aug 7, 2021Updated 4 years ago
- [EMNLP 2025 Findings] Retrieval-Augmented Machine Translation with Unstructured Knowledge☆15Sep 4, 2025Updated 8 months ago
- Accelerate Transformers pipelines using ONNX Runtime.☆10Jun 5, 2020Updated 5 years ago
- java implementation of Bert Tokenizer, support output onnx tensor for onnx model inference☆13Sep 4, 2023Updated 2 years ago
- ☆21Jun 16, 2025Updated 10 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Implementation of paper 'Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference' [NeurIPS'24…☆26Jun 14, 2024Updated last year
- Distill CPM-1☆18May 6, 2021Updated 5 years ago
- ☆19Mar 25, 2024Updated 2 years ago
- ☆25Mar 8, 2026Updated last month
- A toolkit to assess data privacy in LLMs (under development)☆72Jan 2, 2025Updated last year
- qwen-nsa☆87Oct 14, 2025Updated 6 months ago
- 2023龙芯杯mips赛道作品☆14Dec 23, 2023Updated 2 years ago
- LLM KV Cache compression - K+V dual compression, 73-99% VRAM savings, zero accuracy loss☆51Mar 30, 2026Updated last month
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The implementation of Text Classification with Negative Supervision (ACL, 2020)☆10Oct 8, 2020Updated 5 years ago
- 清华大学第八届人工智能挑战赛电子系赛道(原电子系第 26 届队式程序设计大赛 teamstyle26)☆16Updated this week
- ☆17Oct 24, 2024Updated last year
- 使用Sentencepiece对中文语料进行分词☆13Nov 30, 2023Updated 2 years ago
- 在您的机器上本地离线运行 AI 模型☆11May 8, 2025Updated 11 months ago
- 👂 Typing is slow, talk to me. The project name means ' i am tired ' in Chinese (我累了). This is a AI efficiency assistant, complete your d…☆16Jun 8, 2024Updated last year
- ☆41Apr 11, 2023Updated 3 years ago