vllm-project / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
β62,548Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Fine-tuning & Reinforcement Learning for LLMs. π¦₯ Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.β48,036Updated this week
- SGLang is a fast serving framework for large language models and vision language models.β20,075Updated this week
- Large Language Model Text Generation Inferenceβ10,643Updated this week
- Fast and memory-efficient exact attentionβ20,414Updated last week
- LLM inference in C/C++β89,278Updated this week
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)β62,211Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.β7,231Updated this week
- Go ahead and axolotl questionsβ10,753Updated this week
- Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagβ¦β30,866Updated this week
- Python bindings for llama.cppβ9,715Updated 2 months ago
- Tensor library for machine learningβ13,532Updated this week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizatβ¦β12,069Updated this week
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsβ¦β18,022Updated last week
- Open-source search and retrieval database for AI applications.β24,264Updated this week
- QLoRA: Efficient Finetuning of Quantized LLMsβ10,744Updated last year
- Retrieval and Retrieval-augmented LLMsβ10,802Updated 3 weeks ago
- LlamaIndex is the leading framework for building LLM-powered agents over your data.β45,140Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,002Updated last week
- A framework for few-shot evaluation of language models.β10,553Updated 2 weeks ago
- Train transformer language models with reinforcement learning.β16,228Updated this week
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.β25,335Updated last month
- The official Meta Llama 3 GitHub siteβ29,072Updated 9 months ago
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 17+ clouds, oβ¦β8,910Updated last week
- A modular graph-based Retrieval-Augmented Generation (RAG) systemβ29,014Updated last week
- Inference Llama 2 in one file of pure Cβ18,912Updated last year
- Universal LLM Deployment Engine with ML Compilationβ21,590Updated last week
- Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.β11,927Updated this week
- MLX: An array framework for Apple siliconβ22,755Updated this week
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.β12,220Updated last month
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.β39,240Updated 5 months ago