vllm-project / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆61,727Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- SGLang is a fast serving framework for large language models and vision language models.☆19,462Updated this week
- LlamaIndex is the leading framework for building LLM-powered agents over your data.☆45,029Updated this week
- Large Language Model Text Generation Inference☆10,605Updated last month
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.☆47,705Updated this week
- Fast and memory-efficient exact attention☆20,280Updated this week
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆19,959Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,223Updated this week
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.☆39,199Updated 5 months ago
- LLM inference in C/C++☆88,512Updated this week
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We als…☆17,979Updated last week
- 🦜🔗 Build context-aware reasoning applications☆118,418Updated this week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…☆12,008Updated this week
- Go ahead and axolotl questions☆10,716Updated this week
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)☆60,848Updated last week
- Fully open reproduction of DeepSeek-R1☆25,581Updated last month
- Inference code for Llama models☆58,886Updated 9 months ago
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆40,538Updated this week
- Train transformer language models with reinforcement learning.☆16,106Updated this week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆23,870Updated last year
- Python bindings for llama.cpp☆9,697Updated 2 months ago
- Tensor library for machine learning☆13,332Updated last week
- A framework for few-shot evaluation of language models.☆10,488Updated this week
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.☆16,404Updated 3 weeks ago
- Retrieval and Retrieval-augmented LLMs☆10,772Updated last week
- Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sag…☆30,339Updated last week
- Open-source search and retrieval database for AI applications.☆24,161Updated this week
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,719Updated last year
- Instruct-tune LLaMA on consumer hardware☆18,977Updated last year
- Code and documentation to train Stanford's Alpaca models, and generate the data.☆30,190Updated last year
- Universal LLM Deployment Engine with ML Compilation☆21,527Updated last week