vllm-project / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆66,313Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- SGLang is a high-performance serving framework for large language models and multimodal models.☆22,092Updated this week
- LlamaIndex is the leading framework for building LLM-powered agents over your data.☆46,055Updated last week
- Large Language Model Text Generation Inference☆10,716Updated 2 weeks ago
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.☆49,952Updated last week
- Fast and memory-efficient exact attention☆21,317Updated last week
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)☆64,621Updated this week
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆20,347Updated 2 weeks ago
- 🦜🔗 The platform for reliable agents.☆123,095Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,437Updated last week
- Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.☆158,406Updated last week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…☆12,481Updated this week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆24,220Updated last year
- DSPy: The framework for programming—not prompting—language models☆31,066Updated last week
- The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.☆20,037Updated last month
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆33,023Updated this week
- Train transformer language models with reinforcement learning.☆16,809Updated last week
- Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.☆12,018Updated last week
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆30,060Updated this week
- Open-source search and retrieval database for AI applications.☆25,248Updated this week
- A framework for few-shot evaluation of language models.☆11,069Updated last week
- LLM inference in C/C++☆92,287Updated this week
- Retrieval and Retrieval-augmented LLMs☆11,055Updated 2 weeks ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,805Updated last year
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.☆39,332Updated 7 months ago
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"☆13,106Updated last year
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.☆25,950Updated 2 months ago
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We als…☆18,121Updated last month
- Tensor library for machine learning☆13,764Updated 2 weeks ago
- Ongoing research training transformer models at scale☆14,758Updated this week
- Python bindings for llama.cpp☆9,851Updated 4 months ago