HKBU-HPML / ddl-benchmarksLinks

ddl-benchmarks: Benchmarks for Distributed Deep Learning

☆36

Alternatives and similar repositories for ddl-benchmarks

Users that are interested in ddl-benchmarks are comparing it to the libraries listed below

Sorting:

netx-repo / PipeSwitch
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
☆126Updated 3 years ago
SymbioticLab / Salus
Fine-grained GPU sharing primitives
☆147Updated 4 months ago
mlbench / mlbench-benchmarks
Distributed ML Training Benchmarks
☆27Updated 2 years ago
byteps / examples
BytePS examples (Vision, NLP, GAN, etc)
☆19Updated 3 years ago
pengyanghua / optimus
A Deep Learning Cluster Scheduler
☆38Updated 4 years ago
xldrx / tictac
☆22Updated 6 years ago
uwsampl / nexus
☆82Updated 5 months ago
AlibabaPAI / DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆76Updated 4 years ago
anandj91 / p3
☆21Updated 3 years ago
stanford-mast / INFaaS
Model-less Inference Serving
☆92Updated 2 years ago
sands-lab / omnireduce
☆68Updated 2 years ago
kanonjz / paper
Machine Learning System
☆14Updated 5 years ago
saareliad / FTPipe
FTPipe and related pipeline model parallelism research.
☆43Updated 2 years ago
netx-repo / training-bottleneck
Analyze network performance in distributed training
☆19Updated 5 years ago
hwang595 / PyTorch-parameter-server
Implementation of Parameter Server using PyTorch communication lib
☆42Updated 6 years ago
petuum / autodist
Simple Distributed Deep Learning on TensorFlow
☆134Updated 5 months ago
UofT-EcoSystem / hfta
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion
☆32Updated last year
awslabs / lorien
☆42Updated 2 years ago
msr-fiddle / DS-Analyzer
☆38Updated 4 years ago
Distributed-AI / PipeTransformer
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021
☆56Updated 4 years ago
sands-lab / grace
GRACE - GRAdient ComprEssion for distributed deep learning
☆139Updated last year
shriramsb / vdnn-plus-plus
Implementation of vDNN++; an improvement over vDNN
☆18Updated 6 years ago
linnanwang / superneurons-release
this is the release repository of superneurons
☆54Updated 4 years ago
SymbioticLab / Tiresias
Tiresias is a GPU cluster manager for distributed deep learning training.
☆164Updated 5 years ago
HPDL-Group / Merak
☆81Updated 6 months ago
BaguaSys / bagua-net
High performance NCCL plugin for Bagua.
☆15Updated 4 years ago
shriramsb / vDNN
☆22Updated 7 years ago
msr-fiddle / harmony
☆17Updated 2 years ago
Youhe-Jiang / IJCAI2023-OptimalShardedDataParallel
[IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…
☆52Updated 2 years ago
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago