lzhangbv / dear_pytorchLinks

[ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining

☆12

Alternatives and similar repositories for dear_pytorch

Users that are interested in dear_pytorch are comparing it to the libraries listed below

Sorting:

bytedance / QSync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Updated last year
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆40Updated 2 years ago
Distributed-AI / PipeTransformer
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021
☆56Updated 4 years ago
Youhe-Jiang / IJCAI2023-OptimalShardedDataParallel
[IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…
☆52Updated 2 years ago
sjtu-epcc / DVABatch
☆20Updated 3 years ago
HKBU-HPML / ddl-benchmarks
ddl-benchmarks: Benchmarks for Distributed Deep Learning
☆36Updated 5 years ago
osayamenja / Kleos
Complete GPU residency for ML.
☆37Updated last week
saareliad / FTPipe
FTPipe and related pipeline model parallelism research.
☆41Updated 2 years ago
casys-kaist / EnvPipe
☆25Updated last year
kanonjz / paper
Machine Learning System
☆14Updated 5 years ago
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆53Updated 11 months ago
byteps / examples
BytePS examples (Vision, NLP, GAN, etc)
☆19Updated 2 years ago
zhuohan123 / terapipe
☆75Updated 4 years ago
BaguaSys / bagua-net
High performance NCCL plugin for Bagua.
☆15Updated 3 years ago
SymbioticLab / Oobleck
A resilient distributed training framework
☆94Updated last year
HPDL-Group / Merak
☆81Updated 2 months ago
microsoft / SuperScaler
An experimental parallel training platform
☆54Updated last year
facebookresearch / fairring
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …
☆65Updated 3 years ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆66Updated 4 months ago
mlbench / mlbench-benchmarks
Distributed ML Training Benchmarks
☆27Updated 2 years ago
sands-lab / omnireduce
☆69Updated 2 years ago
hao-ai-lab / MuxServe
☆67Updated last year
feifeibear / PSTensor
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
☆10Updated 3 years ago
jiazhihao / attention_superoptimizer
An Attention Superoptimizer
☆22Updated 6 months ago
zhisbug / Cavs
Cavs: An Efficient Runtime System for Dynamic Neural Networks
☆14Updated 4 years ago
thu-pacman / FasterMoE
☆85Updated 3 years ago
Azure / msccl-executor-nccl
☆44Updated 7 months ago
tonyzhao-jt / LLM-PQ
Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
☆34Updated last month
abcdabcd987 / libfabric-efa-demo
☆48Updated 7 months ago
zhuzilin / pytorch-malloc
An external memory allocator example for PyTorch.
☆14Updated 3 years ago