vllm-project / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
β33,809Updated this week
Alternatives and similar repositories for vllm:
Users that are interested in vllm are comparing it to the libraries listed below
- Large Language Model Text Generation Inferenceβ9,583Updated this week
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.β37,496Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β16,978Updated this week
- SGLang is a fast serving framework for large language models and vision language models.β7,305Updated this week
- QLoRA: Efficient Finetuning of Quantized LLMsβ10,161Updated 7 months ago
- Python bindings for llama.cppβ8,420Updated last week
- Universal LLM Deployment Engine with ML Compilationβ19,608Updated this week
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsβ¦β15,910Updated this week
- Finetune Llama 3.3, Mistral, Phi-4, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memoryβ20,611Updated this week
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)β38,227Updated this week
- LlamaIndex is the leading framework for building LLM-powered agents over your data.β38,057Updated this week
- Tensor library for machine learningβ11,541Updated this week
- LLM inference in C/C++β70,713Updated this week
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.β11,146Updated this week
- Inference code for Llama modelsβ57,213Updated 4 months ago
- Instruct-tune LLaMA on consumer hardwareβ18,758Updated 5 months ago
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.β21,096Updated 5 months ago
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.β8,110Updated 8 months ago
- Fast and memory-efficient exact attentionβ15,064Updated this week
- Go ahead and axolotl questionsβ8,293Updated this week
- Train transformer language models with reinforcement learning.β10,609Updated this week
- Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.β10,380Updated this week
- Retrieval and Retrieval-augmented LLMsβ8,237Updated this week
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMathβ9,313Updated 5 months ago
- the AI-native open-source embedding databaseβ16,987Updated this week
- High-speed Large Language Model Serving on PCs with Consumer-grade GPUsβ8,050Updated 4 months ago
- RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)β¦β13,005Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ6,749Updated 6 months ago
- Accessible large language models via k-bit quantization for PyTorch.β6,522Updated this week
- TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that containβ¦β9,147Updated this week