AIS-SNU / Optimus-CCLinks

[ASPLOS'23] Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression

☆6

Alternatives and similar repositories for Optimus-CC

Users that are interested in Optimus-CC are comparing it to the libraries listed below

Sorting:

RC4ML / RPCNIC
RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]
☆11Updated 7 months ago
casys-kaist / EnvPipe
☆25Updated last year
PSAL-POSTECH / M2NDP-public
A Cycle-level simulator for M2NDP
☆28Updated 2 months ago
rishucoding / reproduce_MICRO24_GPU_DLRM_inference
Sharing the codebase and steps for artifact evaluation/reproduction for MICRO 2024 paper
☆9Updated 10 months ago
AIS-SNU / PID-Comm
☆24Updated 7 months ago
strongh2 / sc22-ae
☆13Updated 2 years ago
casys-kaist / HUVM
☆24Updated 2 years ago
hku-systems / naspipe
☆14Updated 3 years ago
ParCIS / Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆26Updated 2 years ago
PKUZHOU / PetS-ATC-2022
☆10Updated last year
sjtu-epcc / DVABatch
☆19Updated 3 years ago
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆52Updated last year
spcl / DNN-cpp-proxies
C++/MPI proxies for distributed training of deep neural networks.
☆13Updated 3 years ago
sarchlab / triosim
☆24Updated 2 weeks ago
leesou / PIM-DL-ASPLOS
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
☆31Updated last year
AIS-SNU / Smart-Infinity
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
☆46Updated last year
qgwang-hust / GraSU
A Fast Graph Update Library for FPGA-based Dynamic Graph Processing
☆9Updated 3 years ago
google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 2 months ago
MLNetwork / rostam
☆24Updated 4 years ago
ucb-bar / dosa
DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators
☆15Updated 9 months ago
PKUZHOU / NeoMem-MICRO-2024
The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
☆53Updated 11 months ago
Sys-KU / DeepPlan
[ACM EuroSys '23] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆57Updated last year
kazukiosawa / pipe-fisher
☆9Updated 2 years ago
tallendev / uvm-eval
This serves as a repository for reproducibility of the SC21 paper "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated…
☆33Updated last year
OSU-STARLAB / UVM_benchmark
☆27Updated 4 years ago
dskarlatos / ElasticCuckooHashing
(elastic) cuckoo hashing
☆14Updated 5 years ago
zhuangwang93 / Espresso
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…
☆15Updated last year
barabanshek / Dagger
HW/SW co-designed end-host RPC stack
☆20Updated 3 years ago
jasperzhong / swift
☆14Updated 3 years ago
CMU-SAFARI / MemBen
Benchmark suite containing cache filtered traces for use with Ramulator. These include some of the workloads used in our SIGMETRICS 2019 …
☆22Updated 4 years ago