netx-repo / training-bottleneckLinks

Analyze network performance in distributed training

☆19

Alternatives and similar repositories for training-bottleneck

Users that are interested in training-bottleneck are comparing it to the libraries listed below

Sorting:

netx-repo / PipeSwitch
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
☆126Updated 3 years ago
SymbioticLab / Salus
Fine-grained GPU sharing primitives
☆147Updated 3 months ago
SymbioticLab / Tiresias
Tiresias is a GPU cluster manager for distributed deep learning training.
☆163Updated 5 years ago
HKUST-SING / herald
Herald: Accelerating Neural Recommendation Training with Embedding Scheduling (NSDI 2024)
☆23Updated last year
stanford-futuredata / gavel
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
☆132Updated last year
msr-fiddle / CheckFreq
☆57Updated 4 years ago
msr-fiddle / philly-traces
☆196Updated 6 years ago
byteps / examples
BytePS examples (Vision, NLP, GAN, etc)
☆19Updated 2 years ago
Rivendile / Muri
Artifacts for our SIGCOMM'22 paper Muri
☆44Updated last year
S-Lab-System-Group / HeliosData
Helios Traces from SenseTime
☆59Updated 3 years ago
microsoft / taccl
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
☆76Updated 2 years ago
stanford-mast / INFaaS
Model-less Inference Serving
☆91Updated 2 years ago
suquark / hoplite
☆44Updated 4 years ago
uw-mad-dash / shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
☆45Updated 2 years ago
uwsampl / nexus
☆82Updated 5 months ago
microsoft / msccl-tools
Synthesizer for optimal collective communication algorithms
☆119Updated last year
Raphael-Hao / Abacus
☆38Updated 4 months ago
msr-fiddle / synergy
☆51Updated 2 years ago
uclasystem / bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆53Updated 2 years ago
zhuangwang93 / Espresso
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…
☆15Updated 2 years ago
msr-fiddle / DS-Analyzer
☆38Updated 4 years ago
casys-kaist / glet
☆53Updated 10 months ago
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆103Updated 2 years ago
geoffxy / habitat
🔮 Execution time predictions for deep neural network training iterations across different GPUs.
☆62Updated 2 years ago
pkusys / ElasticFlow
Artifacts for our ASPLOS'23 paper ElasticFlow
☆55Updated last year
sands-lab / omnireduce
☆68Updated 2 years ago
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆91Updated 2 years ago
SJTU-IPADS / reef-artifacts
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆43Updated 3 years ago
mlcommons / chakra-old
Repository for MLCommons Chakra schema and tools
☆39Updated last year
HuaizhengZhang / MIGProfiler
Multi-Instance-GPU profiling tool
☆60Updated 2 years ago