Relaxed-System-Lab / HexGenLinks

[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.

☆30

Alternatives and similar repositories for HexGen

Users that are interested in HexGen are comparing it to the libraries listed below

Sorting:

LoongServe / LoongServe
☆124Updated 11 months ago
hao-ai-lab / MuxServe
☆74Updated this week
tonyzhao-jt / LLM-PQ
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆34Updated last month
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆213Updated 2 months ago
WukLab / preble
Stateful LLM Serving
☆86Updated 7 months ago
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆88Updated 2 years ago
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆155Updated last year
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆62Updated last year
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆60Updated 11 months ago
SymbioticLab / Oobleck
A resilient distributed training framework
☆95Updated last year
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆118Updated last month
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆278Updated 7 months ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆116Updated last year
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆129Updated last year
JF-D / Parcae
☆21Updated last year
kungfu-team / tenplex
Dynamic resources changes for multi-dimensional parallelism training
☆29Updated 2 months ago
awslabs / optimizing-multitask-training-through-dynamic-pipelines
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆20Updated last year
YaoJiayi / CacheBlend
☆141Updated 3 months ago
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated last year
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆155Updated 3 weeks ago
Hsword / Awesome-Machine-Learning-System-Papers
☆77Updated 3 years ago
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆186Updated last year
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆181Updated last week
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆92Updated 2 years ago
Thesys-lab / Helix-ASPLOS25
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆67Updated last week
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆135Updated 9 months ago
NetX-lab / Ayo
[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo
☆47Updated 2 months ago
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆120Updated 4 months ago
pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆43Updated 10 months ago
LLMServe / dLoRA-artifact
☆26Updated last year