sands-lab / omnireduceLinks

☆68

Alternatives and similar repositories for omnireduce

Users that are interested in omnireduce are comparing it to the libraries listed below

Sorting:

suquark / hoplite
☆44Updated 4 years ago
netx-repo / PipeSwitch
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
☆126Updated 3 years ago
zhuangwang93 / Espresso
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…
☆15Updated 2 years ago
in-ATP / ATP
☆84Updated 3 years ago
HKBU-HPML / ddl-benchmarks
ddl-benchmarks: Benchmarks for Distributed Deep Learning
☆36Updated 5 years ago
byteps / examples
BytePS examples (Vision, NLP, GAN, etc)
☆19Updated 3 years ago
Rivendile / Muri
Artifacts for our SIGCOMM'22 paper Muri
☆44Updated last year
jasperzhong / swift
☆15Updated 3 years ago
uwsampl / nexus
☆82Updated 5 months ago
SJTU-IPADS / reef-artifacts
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆43Updated 3 years ago
casys-kaist / EnvPipe
☆25Updated 2 years ago
netx-repo / training-bottleneck
Analyze network performance in distributed training
☆19Updated 5 years ago
msr-fiddle / CheckFreq
☆57Updated 4 years ago
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆57Updated last year
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆103Updated 2 years ago
microsoft / taccl
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
☆77Updated 2 years ago
uw-mad-dash / shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
☆46Updated 3 years ago
stanford-mast / INFaaS
Model-less Inference Serving
☆92Updated 2 years ago
crazyboycjr / nethint
The prototype for NSDI paper "NetHint: White-Box Networking for Multi-Tenant Data Centers"
☆26Updated last year
crossroadsfpga / enso
Ensō is a high-performance streaming interface for NIC-application communication.
☆76Updated 3 months ago
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆91Updated 2 years ago
SymbioticLab / Tiresias
Tiresias is a GPU cluster manager for distributed deep learning training.
☆164Updated 5 years ago
msr-fiddle / harmony
☆17Updated 2 years ago
ParCIS / Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆27Updated 2 years ago
microsoft / TE-CCL
☆43Updated last year
gudiandian / ElasticFlow
☆16Updated last year
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆37Updated 3 years ago
bytedance / QSync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Updated last year
phoenix-dataplane / mCCS
Managed collective communication service
☆22Updated last year
SymbioticLab / Justitia
Justitia provides RDMA isolation between applications with diverse requirements.
☆42Updated 3 years ago