vllm-project/vllm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vllm-project/vllm)

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

☆71,883

Alternatives and similar repositories for vllm

Users that are interested in vllm are comparing it to the libraries listed below

Sorting:

sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆23,905Updated this week
hiyouga / LlamaFactory
View on GitHub
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
☆67,659Feb 27, 2026Updated last week
ggml-org / llama.cpp
View on GitHub
LLM inference in C/C++
☆96,322Updated this week
unslothai / unsloth
View on GitHub
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
☆53,029Updated this week
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆22,460Updated this week
deepspeedai / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆41,706Feb 27, 2026Updated last week
langchain-ai / langchain
View on GitHub
🦜🔗 The platform for reliable agents.
☆127,809Updated this week
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,426Jun 2, 2025Updated 9 months ago
run-llama / llama_index
View on GitHub
LlamaIndex is the leading document agent and OCR platform
☆47,374Updated this week
NVIDIA / TensorRT-LLM
View on GitHub
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…
☆12,993Updated this week
ollama / ollama
View on GitHub
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
☆163,632Updated this week
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,788Jan 8, 2026Updated last month
BerriAI / litellm
View on GitHub
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…
☆37,083Feb 27, 2026Updated last week
huggingface / transformers
View on GitHub
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal model…
☆157,071Feb 27, 2026Updated last week
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆20,717Updated this week
microsoft / autogen
View on GitHub
A programming framework for agentic AI
☆54,956Jan 22, 2026Updated last month
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆32,519Updated this week
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆17,523Updated this week
meta-llama / llama
View on GitHub
Inference code for Llama models
☆59,183Jan 26, 2025Updated last year
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆15,461Updated this week
InternLM / lmdeploy
View on GitHub
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
☆7,645Updated this week
verl-project / verl
View on GitHub
verl: Volcano Engine Reinforcement Learning for LLMs
☆19,519Updated this week
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,500Aug 12, 2024Updated last year
microsoft / graphrag
View on GitHub
A modular graph-based Retrieval-Augmented Generation (RAG) system
☆31,162Updated this week
langgenius / dify
View on GitHub
Production-ready platform for agentic workflow development.
☆130,750Updated this week
mem0ai / mem0
View on GitHub
Universal memory layer for AI Agents
☆48,604Updated this week
triton-lang / triton
View on GitHub
Development repository for the Triton language and compiler
☆18,501Updated this week
QwenLM / Qwen3
View on GitHub
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
☆26,713Jan 9, 2026Updated last month
hpcaitech / ColossalAI
View on GitHub
Making large AI models cheaper, faster and more accessible
☆41,364Updated this week
ray-project / ray
View on GitHub
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
☆41,516Updated this week
open-webui / open-webui
View on GitHub
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
☆125,513Updated this week
mlc-ai / mlc-llm
View on GitHub
Universal LLM Deployment Engine with ML Compilation
☆22,082Updated this week
infiniflow / ragflow
View on GitHub
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…
☆73,900Updated this week
facebookresearch / faiss
View on GitHub
A library for efficient similarity search and clustering of dense vectors.
☆39,255Updated this week
milvus-io / milvus
View on GitHub
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
☆43,056Updated this week
huggingface / open-r1
View on GitHub
Fully open reproduction of DeepSeek-R1
☆25,910Nov 24, 2025Updated 3 months ago
gradio-app / gradio
View on GitHub
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
☆41,921Updated this week
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆11,540Updated this week
OpenHands / OpenHands
View on GitHub
🙌 OpenHands: AI-Driven Development
☆68,459Updated this week