msr-fiddle / piperLinks

☆9

Alternatives and similar repositories for piper

Users that are interested in piper are comparing it to the libraries listed below

Sorting:

parasailteam / coconet
☆79Updated 2 years ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆67Updated 3 months ago
zhuohan123 / terapipe
☆75Updated 4 years ago
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆48Updated this week
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated last year
hgyhungry / ShflBW_Sparse_NN
☆16Updated 2 years ago
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆144Updated 3 years ago
HPMLL / DTC-SpMM_ASPLOS24
☆33Updated last year
HPMLL / SpInfer_EuroSys25
☆19Updated 3 months ago
microsoft / SparTA
☆149Updated 11 months ago
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
apuaaChen / vectorSparse
☆31Updated 2 years ago
calculon-ai / calculon
☆143Updated last year
uchuhimo / amanda
☆18Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆82Updated 2 years ago
HPDL-Group / Merak
☆80Updated 2 months ago
microsoft / msccl-tools
Synthesizer for optimal collective communication algorithms
☆110Updated last year
zhaiyi000 / tlm
☆41Updated last year
mutinifni / splitwise-sim
LLM serving cluster simulator
☆107Updated last year
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆186Updated 9 months ago
CRAFT-THU / RoDe
A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs
☆22Updated last year
ranggihwang / Pregated_MoE
☆48Updated last year
YukeWang96 / QGTC_PPoPP22
Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.
☆30Updated 3 years ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆113Updated last week
goliaro / specinfer-ae
☆23Updated last year
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆142Updated last year
tlc-pack / tenset
☆92Updated 2 years ago
nullplay / Workload-Aware-Co-Optimization
Workload-Aware Co-Optimization
☆8Updated 2 years ago
sitar-lab / NeuSight
☆45Updated 3 weeks ago
pku-liang / AMOS
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆113Updated 2 years ago