cakeng / ASPENLinks

This is the proof-of-concept CPU implementation of ASPEN used for the NeurIPS'23 paper ASPEN: Breaking Operator Barriers for Efficient Parallelization of Deep Neural Networks.

☆13

Alternatives and similar repositories for ASPEN

Users that are interested in ASPEN are comparing it to the libraries listed below

Sorting:

casys-kaist / EnvPipe
☆25Updated 2 years ago
msr-fiddle / dnn-partitioning
☆41Updated 5 years ago
tonyzhao-jt / LLM-PQ
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆35Updated 2 months ago
zhuangwang93 / Espresso
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…
☆15Updated 2 years ago
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
UMass-LIDS / Proteus
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
☆13Updated last year
iankur / vqllm
Residual vector quantization for KV cache compression in large language model
☆10Updated last year
sjtu-epcc / DVABatch
☆21Updated 3 years ago
ParCIS / Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆27Updated 2 years ago
UofT-EcoSystem / hfta
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion
☆32Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆91Updated 2 years ago
TiledTensor / TiledLower
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆14Updated 11 months ago
PatrickGuo / Mistify
☆10Updated 4 years ago
uw-mad-dash / Accordion
Code for reproducing experiments performed for Accoridon
☆13Updated 4 years ago
microsoft / taccl
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
☆77Updated 2 years ago
awslabs / optimizing-multitask-training-through-dynamic-pipelines
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆20Updated last year
eth-easl / sailor
AI model training on heterogeneous, geo-distributed resources
☆19Updated this week
google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 6 months ago
humuyan / Korch
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆38Updated 7 months ago
SophiaLi06 / BytePS_THC
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
☆20Updated last year
SymbioticLab / Oobleck
A resilient distributed training framework
☆96Updated last year
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆59Updated 7 months ago
Soroosh129 / NeuOS
Source code for the paper: "A Latency-Predictable Multi-Dimensional Optimization Framework forDNN-driven Autonomous Systems"
☆22Updated 4 years ago
Thesys-lab / Helix-ASPLOS25
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆71Updated last month
hao-ai-lab / MuxServe
☆79Updated last month
mutinifni / splitwise-sim
LLM serving cluster simulator
☆120Updated last year
ranggihwang / Pregated_MoE
☆57Updated last year
hku-systems / naspipe
☆14Updated 3 years ago
JF-D / Proteus
☆23Updated last year
qipengwang / Melon
MobiSys#114
☆22Updated 2 years ago