zhuangwang93 / CupcakeLinks

Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training (MLSys '23)

☆9

Alternatives and similar repositories for Cupcake

Users that are interested in Cupcake are comparing it to the libraries listed below

Sorting:

zhuangwang93 / Espresso
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…
☆15Updated last year
gudiandian / ElasticFlow
☆16Updated last year
uw-mad-dash / shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
☆44Updated 2 years ago
dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆25Updated 7 months ago
romilbhardwaj / cilantro
Source code for OSDI 2023 paper titled "Cilantro - Performance-Aware Resource Allocation for General Objectives via Online Feedback"
☆39Updated 2 years ago
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
Rivendile / Muri
Artifacts for our SIGCOMM'22 paper Muri
☆42Updated last year
harvard-cns / Harvard-CNS-Seminar
Reading seminar in Harvard Cloud Networking and Systems Group
☆16Updated 2 years ago
suquark / hoplite
☆45Updated 3 years ago
chaojin0310 / Ditto
Artifacts for our SIGCOMM'23 paper Ditto
☆15Updated last year
S-Lab-System-Group / Primo
Primo: Practical Learning-Augmented Systems with Interpretable Models
☆19Updated last year
H-Huang / torch_collective_extension
A minimum demo for PyTorch distributed extension functionality for collectives.
☆12Updated 11 months ago
ParCIS / Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆26Updated 2 years ago
thustorage / Medusa
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆24Updated 2 months ago
sands-lab / omnireduce
☆69Updated 2 years ago
Froot-NetSys / Arya
Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling
☆13Updated last year
axio-project / FuseLink
Efficient GPU communication over multiple NICs.
☆10Updated last week
pkusys / Auncel
Vector search with bounded performance.
☆35Updated last year
RC4ML / RPCNIC
RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]
☆11Updated 7 months ago
Thesys-lab / Helix-ASPLOS25
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆53Updated 7 months ago
alibaba-edu / qwen-bailian-usagetraces-anon
☆26Updated last month
smartnickit-project / smartnic-bench
A rust-based benchmark for BlueField SmartNICs.
☆28Updated 2 years ago
liangyuRain / ForestColl
☆12Updated 2 months ago
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆36Updated 3 years ago
msr-fiddle / synergy
☆51Updated 2 years ago
utnslab / Medes
Deduplication over dis-aggregated memory for Serverless Computing
☆13Updated 3 years ago
uclasystem / bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆50Updated 2 years ago
crazyboycjr / nethint
The prototype for NSDI paper "NetHint: White-Box Networking for Multi-Tenant Data Centers"
☆26Updated last year
casys-kaist / EnvPipe
☆25Updated last year
bytedance / QSync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Updated last year