Thesys-lab/Helix-ASPLOS25

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Thesys-lab/Helix-ASPLOS25)

Thesys-lab / Helix-ASPLOS25

Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"

☆93

Alternatives and similar repositories for Helix-ASPLOS25

Users that are interested in Helix-ASPLOS25 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gudiandian / ElasticFlow
View on GitHub
☆17May 10, 2024Updated 2 years ago
Relaxed-System-Lab / HexGen
View on GitHub
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆37May 6, 2024Updated 2 years ago
Hsword / SpotServe
View on GitHub
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆135Feb 22, 2024Updated 2 years ago
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆511Jan 8, 2026Updated 6 months ago
alibaba / ServeGen
View on GitHub
A framework for generating realistic LLM serving workloads
☆161May 11, 2026Updated 2 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
netiken / m4
View on GitHub
[TBD] "m4: A Learned Flow-level Network Simulator" by Chenning Li, Anton A. Zabreyko, Om Chabra, Arash Nasr-Esfahany, Kevin Zhao, Pratees…
☆21Jun 19, 2026Updated last month
UMass-LIDS / Proteus
View on GitHub
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
☆13Mar 7, 2024Updated 2 years ago
tonyzhao-jt / LLM-PQ
View on GitHub
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆39Aug 29, 2025Updated 10 months ago
microsoft / elasticflow-traces
View on GitHub
Integrated Training Platform (ITP) traces used in ElasticFlow paper.
☆31Dec 23, 2022Updated 3 years ago
tyler-griggs / melange-release
View on GitHub
☆48Jun 27, 2024Updated 2 years ago
spcl / crosspipe
View on GitHub
Official implementation of CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training (ATC '25), built on top of Megatro…
☆17Jul 6, 2025Updated last year
gty111 / gLLM
View on GitHub
An Efficient and Versatile Inference Engine for Distributed LLM Serving
☆66Updated this week
microsoft / vidur
View on GitHub
Accurate, large-scale, and extensible simulator for LLM inference Systems
☆642Jul 25, 2025Updated 11 months ago
ConnectedSystemsLab / StarCDN-Simulator
View on GitHub
Artifact for StarCDN's simulation framework
☆17Apr 14, 2026Updated 3 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
LLMServe / DistServe
View on GitHub
Disaggregated serving system for Large Language Models (LLMs).
☆826Apr 6, 2025Updated last year
eth-easl / sailor
View on GitHub
AI model training on heterogeneous, geo-distributed resources
☆46Nov 24, 2025Updated 7 months ago
hao-ai-lab / MuxServe
View on GitHub
☆90Oct 17, 2025Updated 9 months ago
JF-D / Parcae
View on GitHub
☆22Apr 22, 2024Updated 2 years ago
lt2000 / MinFlow
View on GitHub
☆12Jan 12, 2024Updated 2 years ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
dywsjtu / apparate
View on GitHub
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆24Nov 21, 2024Updated last year
shaojiawei07 / Branchy-GNN
View on GitHub
☆23May 29, 2023Updated 3 years ago
Networked-System-and-Security-Group / Themis
View on GitHub
ICNP'25-THEMIS: Addressing Congestion-Induced Unfairness in Long-Haul RDMA Networks
☆16Jun 27, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SymbioticLab / tensorflow-salus
View on GitHub
tensorflow fork with Salus integration
☆12Jan 7, 2022Updated 4 years ago
Leosang-lx / FlowSpec
View on GitHub
Continuous Pipelined Speculative Decoding
☆21May 25, 2026Updated last month
romilbhardwaj / cilantro
View on GitHub
Source code for OSDI 2023 paper titled "Cilantro - Performance-Aware Resource Allocation for General Objectives via Online Feedback"
☆41Jul 6, 2023Updated 3 years ago
ServerlessLLM / ServerlessLLM
View on GitHub
Serverless LLM Serving for Everyone.
☆692May 4, 2026Updated 2 months ago
MachineLearningSystem / 25ASPLOS-Medusa
View on GitHub
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆12Nov 8, 2024Updated last year
JasonNing96 / DSSD-Efficient-Edge-Computing
View on GitHub
The speculative decoding for uav environment
☆17Oct 18, 2025Updated 9 months ago
thustorage / GPreempt
View on GitHub
☆25May 18, 2025Updated last year
zenrran4nlp / Awesome-LLM-Inference-Serving
View on GitHub
☆50Apr 29, 2025Updated last year
microsoft / SwitchML
View on GitHub
Switch-based Training Acceleration for Machine Learning (SwitchML)
☆16Apr 13, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 7 months ago
LoongServe / LoongServe
View on GitHub
☆135Nov 11, 2024Updated last year
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
HiEST / gpu-topo-aware
View on GitHub
GPU topology-aware scheduler
☆13Jul 7, 2017Updated 9 years ago
msr-fiddle / blox
View on GitHub
☆46Jul 4, 2024Updated 2 years ago
pkusys / ElasticFlow
View on GitHub
Artifacts for our ASPLOS'23 paper ElasticFlow
☆56May 10, 2024Updated 2 years ago
mutinifni / splitwise-sim
View on GitHub
LLM serving cluster simulator
☆157Apr 25, 2024Updated 2 years ago