Shigangli / Ok-Topk

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.

☆23

Related projects ⓘ

Alternatives and complementary repositories for Ok-Topk

zhuangwang93 / Espresso
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…
☆15Updated last year
casys-kaist / HUVM
☆23Updated 2 years ago
YukeWang96 / MGG_OSDI23
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…
☆37Updated 8 months ago
LLMServe / dLoRA-artifact
☆14Updated 5 months ago
pkusys / ElasticFlow
Artifacts for our ASPLOS'23 paper ElasticFlow
☆52Updated 6 months ago
hku-systems / naspipe
☆14Updated 2 years ago
PKUZHOU / PetS-ATC-2022
☆9Updated last year
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆34Updated last year
Rivendile / Muri
Artifacts for our SIGCOMM'22 paper Muri
☆40Updated 10 months ago
casys-kaist / EnvPipe
☆23Updated last year
zhuangwang93 / Cupcake
Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training (MLSys '23)
☆9Updated last year
sjtu-epcc / DVABatch
☆18Updated 2 years ago
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆43Updated last year
casys-kaist / glet
☆41Updated last year
platformxlab / G10
☆33Updated last year
SJTU-IPADS / ugache
☆23Updated last year
UMass-LIDS / Proteus
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
☆8Updated 8 months ago
jasperzhong / swift
☆13Updated 2 years ago
Shigangli / Chimera
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.
☆46Updated 11 months ago
YukeWang96 / QGTC_PPoPP22
Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.
☆27Updated 2 years ago
Mutinifni / splitwise-sim
LLM serving cluster simulator
☆81Updated 6 months ago
Raphael-Hao / Abacus
☆37Updated 3 years ago
guessmewho233 / CoGNN_info_for_SC22
☆8Updated 2 years ago
parasailteam / coconet
☆73Updated last year
sands-lab / omnireduce
☆69Updated last year
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆54Updated 3 months ago
rkhan055 / SHADE
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training
☆29Updated last year
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆33Updated 2 years ago
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆85Updated last year
msr-fiddle / dnn-partitioning
☆38Updated 4 years ago