vllm-project / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
โ69,622Updated last week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- SGLang is a high-performance serving framework for large language models and multimodal models.โ23,439Updated this week
- Fine-tuning & Reinforcement Learning for LLMs. ๐ฆฅ Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.โ51,922Updated this week
- Fast and memory-efficient exact attentionโ22,113Updated last week
- Large Language Model Text Generation Inferenceโ10,757Updated last month
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.โ26,530Updated last month
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizatโฆโ12,867Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.โ7,606Updated this week
- LLM inference in C/C++โ94,823Updated this week
- Ongoing research training transformer models at scaleโ15,162Updated this week
- A framework for few-shot evaluation of language models.โ11,393Updated this week
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.โ39,402Updated 8 months ago
- Train transformer language models with reinforcement learning.โ17,297Updated this week
- LlamaIndex is the leading framework for building LLM-powered agents over your data.โ46,841Updated last week
- Retrieval and Retrieval-augmented LLMsโ11,280Updated last month
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing aโฆโ35,429Updated this week
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)โ67,023Updated last week
- A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizationsโ16,501Updated this week
- verl: Volcano Engine Reinforcement Learning for LLMsโ19,132Updated this week
- ๐ค PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.โ20,619Updated this week
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.โ13,234Updated last week
- Go ahead and axolotl questionsโ11,251Updated last week
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsโฆโ18,190Updated 3 months ago
- Python bindings for llama.cppโ9,971Updated 5 months ago
- Tensor library for machine learningโ13,923Updated this week
- Open-source search and retrieval database for AI applications.โ26,064Updated this week
- ๐ฆ๐ The platform for reliable agents.โ126,317Updated this week
- The official repo of Qwen (้ไนๅ้ฎ) chat & pretrained large language model proposed by Alibaba Cloud.โ20,322Updated 2 weeks ago
- Fully open reproduction of DeepSeek-R1โ25,866Updated 2 months ago
- Inference code for Llama modelsโ59,141Updated last year
- A modular graph-based Retrieval-Augmented Generation (RAG) systemโ30,872Updated this week