Snowflake-Labs / vllmLinks
☆15Updated last month
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Benchmark suite for LLMs from Fireworks.ai☆83Updated 2 weeks ago
 - Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
 - A collection of reproducible inference engine benchmarks☆37Updated 6 months ago
 - ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆235Updated this week
 - IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated last month
 - Pytorch DTensor native training library for LLMs/VLMs with OOTB Hugging Face support☆141Updated this week
 - The backend behind the LLM-Perf Leaderboard☆11Updated last year
 - ArcticInference: vLLM plugin for high-throughput, low-latency inference☆288Updated last week
 - Example ML projects that use the Determined library.☆32Updated last year
 - experiments with inference on llama☆103Updated last year
 - The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated last year
 - ☆52Updated last year
 - ☆21Updated 8 months ago
 - Google TPU optimizations for transformers models☆121Updated 9 months ago
 - A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
 - vLLM adapter for a TGIS-compatible gRPC server.☆42Updated this week
 - vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆92Updated this week
 - ☆46Updated last year
 - Easy and Efficient Quantization for Transformers☆202Updated 4 months ago
 - A place to store reusable transformer components of my own creation or found on the interwebs☆59Updated 2 weeks ago
 - Make triton easier☆48Updated last year
 - Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆65Updated 2 years ago
 - Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings☆88Updated 3 months ago
 - An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last year
 - Repository for CPU Kernel Generation for LLM Inference