ray-project / distmlLinks

Distributed ML Optimizer

☆32

Alternatives and similar repositories for distml

Users that are interested in distml are comparing it to the libraries listed below

Sorting:

petuum / autodist
Simple Distributed Deep Learning on TensorFlow
☆133Updated last month
pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆158Updated last month
ray-project / ray_shuffling_data_loader
A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…
☆18Updated 2 years ago
octoml / synr
A library for syntactically rewriting Python programs, pronounced (sinner).
☆69Updated 3 years ago
zhisbug / ray-scalable-ml-design
Some microbenchmarks and design docs before commencement
☆12Updated 4 years ago
ucbrise / hypersched
Deadline-based hyperparameter tuning on RayTune.
☆31Updated 5 years ago
feature-store / ralf
☆30Updated 2 years ago
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆260Updated this week
GuanhuaWang / sensAI
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data
☆64Updated last year
ryantd / veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
☆18Updated 3 years ago
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
anyscale / llm-continuous-batching-benchmarks
☆120Updated last year
microsoft / varuna
☆251Updated last year
facebookresearch / fairring
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …
☆65Updated 3 years ago
spcl / substation
Research and development for optimizing transformers
☆129Updated 4 years ago
eth-easl / mixtera
A lightweight, user-friendly data-plane for LLM training.
☆24Updated 2 weeks ago
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆113Updated 2 years ago
nums-project / nums
A library that translates Python and NumPy to optimized distributed systems code.
☆132Updated 2 years ago
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆32Updated 3 months ago
petuum / adaptdl
Resource-adaptive cluster scheduler for deep learning training.
☆447Updated 2 years ago
pytorch / rfcs
PyTorch RFCs (experimental)
☆134Updated 2 months ago
NVIDIA / LDDL
Distributed preprocessing and data loading for language datasets
☆39Updated last year
facebookresearch / FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as …
☆194Updated 3 years ago
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆118Updated 8 months ago
ray-project / ray_lightning
Pytorch Lightning Distributed Accelerators using Ray
☆213Updated last year
facebookresearch / DLRM-FlexFlow
Development repository for integrating FlexFlow (A distributed deep learning framework that supports flexible parallelization strategies)…
☆28Updated 3 years ago
LiuShuai26 / Distributed-RL
Distributed DRL by Ray and TensorFlow Tutorial.
☆10Updated 5 years ago
DS3Lab / DT-FM
☆94Updated 3 years ago
HazyResearch / mongoose
A Learnable LSH Framework for Efficient NN Training
☆32Updated 4 years ago
parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆132Updated 3 years ago