vllm-project / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆56,349Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- SGLang is a fast serving framework for large language models and vision language models.☆17,106Updated last week
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less…☆44,634Updated last week
- Fast and memory-efficient exact attention☆19,099Updated this week
- TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizati…☆11,437Updated this week
- LLM inference in C/C++☆85,425Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆6,920Updated this week
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)☆56,869Updated this week
- Large Language Model Text Generation Inference☆10,455Updated this week
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.☆24,200Updated 3 weeks ago
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆11,088Updated last month
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆19,390Updated last week
- Train transformer language models with reinforcement learning.☆15,259Updated this week
- Fully open reproduction of DeepSeek-R1☆25,345Updated 2 weeks ago
- Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sag…☆28,040Updated this week
- Go ahead and axolotl questions☆10,289Updated this week
- Universal LLM Deployment Engine with ML Compilation☆21,192Updated this week
- The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.☆19,148Updated 3 weeks ago
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We als…☆17,788Updated this week
- DSPy: The framework for programming—not prompting—language models☆27,551Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆39,899Updated this week
- Python bindings for llama.cpp☆9,515Updated 2 weeks ago
- A framework for few-shot evaluation of language models.☆9,906Updated last week
- verl: Volcano Engine Reinforcement Learning for LLMs☆12,563Updated last week
- Retrieval and Retrieval-augmented LLMs☆10,406Updated this week
- Inference code for Llama models☆58,668Updated 7 months ago
- Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you ne…☆8,437Updated this week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆23,378Updated last year
- Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.☆151,103Updated this week
- High-speed Large Language Model Serving for Local Deployment☆8,315Updated 3 weeks ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,642Updated last year