siddharth9820 / MoDNN

Implementation of algorithms for memory optimized deep neural network training

☆10

Alternatives and similar repositories for MoDNN:

Users that are interested in MoDNN are comparing it to the libraries listed below

msr-fiddle / dnn-partitioning
☆40Updated 4 years ago
zhuangwang93 / Espresso
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…
☆15Updated last year
uw-mad-dash / Accordion
Code for reproducing experiments performed for Accoridon
☆13Updated 3 years ago
lzhangbv / dear_pytorch
[ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining
☆12Updated last year
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆34Updated 2 years ago
zhuangwang93 / Cupcake
Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training (MLSys '23)
☆9Updated last year
UofT-EcoSystem / hfta
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion
☆32Updated 11 months ago
kanonjz / paper
Machine Learning System
☆14Updated 4 years ago
ParCIS / Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆25Updated 2 years ago
reconfigurable-ml-pipeline / ipa
Source code of IPA, https://escholarship.org/uc/item/2p0805dq
☆10Updated 10 months ago
zoranzhao / DeepThings
A Portable C Library for Distributed CNN Inference on IoT Edge Clusters
☆83Updated 5 years ago
Distributed-AI / PipeTransformer
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021
☆56Updated 3 years ago
UofT-EcoSystem / rlscope
RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads
☆43Updated 4 years ago
sands-lab / omnireduce
☆69Updated 2 years ago
netiken / m3
[ACM SIGCOMM 2024] "m3: Accurate Flow-Level Performance Estimation using Machine Learning" by Chenning Li, Arash Nasr-Esfahany, Kevin Zha…
☆24Updated 6 months ago
epfml / sparsifiedSGD
Sparsified SGD with Memory: https://arxiv.org/abs/1809.07599
☆58Updated 6 years ago
vineeths96 / Gradient-Compression
We present a set of all-reduce compatible gradient compression algorithms which significantly reduce the communication overhead while mai…
☆9Updated 3 years ago
HKBU-HPML / ddl-benchmarks
ddl-benchmarks: Benchmarks for Distributed Deep Learning
☆37Updated 4 years ago
netx-repo / training-bottleneck
Analyze network performance in distributed training
☆18Updated 4 years ago
Thesys-lab / parity-models
Learning-Based Coded Computation
☆47Updated 2 years ago
pengyanghua / DL2
a deep learning-driven scheduler for elastic training in deep learning clusters
☆29Updated 4 years ago
sjtu-epcc / DVABatch
☆19Updated 2 years ago
H-Huang / torch_collective_extension
A minimum demo for PyTorch distributed extension functionality for collectives.
☆11Updated 8 months ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆38Updated 2 years ago
kungfu-team / tenplex
Dynamic resources changes for multi-dimensional parallelism training
☆25Updated 5 months ago
qipengwang / Melon
MobiSys#114
☆21Updated last year
dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆24Updated 5 months ago
jashwantraj92 / cocktail
☆14Updated 8 months ago
S-Lab-System-Group / Hydro
Surrogate-based Hyperparameter Tuning System
☆28Updated last year
CompML / survey-deep-gradient-compression
☆10Updated 3 years ago