Snowflake-Labs / vllmLinks
☆16Updated last month
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Benchmark suite for LLMs from Fireworks.ai☆84Updated last month
- The backend behind the LLM-Perf Leaderboard☆11Updated last year
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆263Updated this week
- Simple high-throughput inference library☆153Updated 7 months ago
- A collection of reproducible inference engine benchmarks☆38Updated 8 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆174Updated last week
- LM engine is a library for pretraining/finetuning LLMs☆102Updated this week
- Cray-LM unified training and inference stack.☆22Updated 10 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆55Updated 4 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆94Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 3 months ago
- ☆31Updated last year
- Google TPU optimizations for transformers models☆131Updated last week
- ☆47Updated last year
- Example ML projects that use the Determined library.☆32Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 3 weeks ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆46Updated this week
- experiments with inference on llama☆103Updated last year
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆52Updated 10 months ago
- ☆273Updated last week
- Easy and Efficient Quantization for Transformers☆202Updated 6 months ago
- ML/DL Math and Method notes☆65Updated 2 years ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆354Updated this week
- This is a fork of SGLang for hip-attention integration. Please refer to hip-attention for detail.☆18Updated this week
- ☆52Updated last year
- ☆113Updated last month
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆16Updated last year
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year