petuum / autodistLinks

Simple Distributed Deep Learning on TensorFlow

☆134

Alternatives and similar repositories for autodist

Users that are interested in autodist are comparing it to the libraries listed below

Sorting:

petuum / adaptdl
Resource-adaptive cluster scheduler for deep learning training.
☆448Updated 2 years ago
spcl / substation
Research and development for optimizing transformers
☆131Updated 4 years ago
Distributed-AI / PipeTransformer
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021
☆56Updated 4 years ago
HKBU-HPML / ddl-benchmarks
ddl-benchmarks: Benchmarks for Distributed Deep Learning
☆36Updated 5 years ago
ucbrise / hypersched
Deadline-based hyperparameter tuning on RayTune.
☆31Updated 5 years ago
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆273Updated 2 months ago
facebookresearch / fairring
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …
☆65Updated 3 years ago
ray-project / distml
Distributed ML Optimizer
☆34Updated 4 years ago
saareliad / FTPipe
FTPipe and related pipeline model parallelism research.
☆43Updated 2 years ago
awslabs / lorien
☆42Updated 2 years ago
octoml / synr
A library for syntactically rewriting Python programs, pronounced (sinner).
☆68Updated 3 years ago
snuspl / parallax
A Tool for Automatic Parallelization of Deep Learning Training in Distributed Multi-GPU Environments.
☆132Updated 3 years ago
geoffxy / habitat
🔮 Execution time predictions for deep neural network training iterations across different GPUs.
☆62Updated 2 years ago
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated last month
SymbioticLab / Salus
Fine-grained GPU sharing primitives
☆144Updated 2 months ago
GuanhuaWang / sensAI
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data
☆65Updated last year
mlcommons / training_results_v1.0
This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.
☆37Updated last year
parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆136Updated 3 years ago
tbd-ai / tbd-suite
☆47Updated 2 years ago
byteps / examples
BytePS examples (Vision, NLP, GAN, etc)
☆19Updated 2 years ago
TalwalkarLab / paleo
An analytical performance modeling tool for deep neural networks.
☆91Updated 5 years ago
awslabs / raf
☆145Updated 8 months ago
msr-fiddle / pipedream
☆393Updated 2 years ago
lsds / KungFu
Fast and Adaptive Distributed Machine Learning for TensorFlow, PyTorch and MindSpore.
☆298Updated last year
mlcommons / training_results_v0.7
This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.
☆57Updated 2 years ago
Hsword / Hetu
A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …
☆121Updated last year
facebookresearch / FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as …
☆194Updated 3 years ago
AlibabaPAI / DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆76Updated 4 years ago
stanford-mast / INFaaS
Model-less Inference Serving
☆92Updated last year
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆114Updated 2 years ago