KuntaiDu / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

☆11

Alternatives and similar repositories for vllm:

Users that are interested in vllm are comparing it to the libraries listed below

alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆59Updated 7 months ago
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆62Updated this week
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆109Updated 10 months ago
S-Lab-System-Group / Hydro
Surrogate-based Hyperparameter Tuning System
☆28Updated last year
SymbioticLab / Oobleck
A resilient distributed training framework
☆88Updated 9 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆98Updated this week
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆142Updated 3 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆205Updated 3 weeks ago
LoongServe / LoongServe
☆80Updated 2 months ago
Youhe-Jiang / IJCAI2023-OptimalShardedDataParallel
[IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…
☆51Updated last year
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆278Updated this week
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆119Updated 8 months ago
S-Lab-System-Group / Lucid
Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs
☆51Updated last year
stanford-mast / INFaaS
Model-less Inference Serving
☆83Updated last year
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆59Updated 8 months ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆296Updated 4 months ago
microsoft / SuperScaler
An experimental parallel training platform
☆54Updated 9 months ago
pengyanghua / optimus
A Deep Learning Cluster Scheduler
☆37Updated 4 years ago
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆38Updated this week
Mutinifni / splitwise-sim
LLM serving cluster simulator
☆89Updated 8 months ago
tonyzhao-jt / LLM-PQ
Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
☆29Updated 10 months ago
pkusys / ElasticFlow
Artifacts for our ASPLOS'23 paper ElasticFlow
☆53Updated 8 months ago
Hsword / Hetu
A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …
☆106Updated last year
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆133Updated 3 months ago
thu-pacman / FasterMoE
☆72Updated 2 years ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆87Updated last week
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆53Updated 4 months ago
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆81Updated last year
HarryWu99 / llm_kvcache_sparsity
Implement some method of LLM KV Cache Sparsity
☆30Updated 7 months ago
WukLab / preble
Stateful LLM Serving
☆44Updated 5 months ago