Youhe-Jiang / IJCAI2023-OptimalShardedDataParallelLinks

[IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any interests, please visit/star/fork https://github.com/Youhe-Jiang/OptimalShardedDataParallel

☆52

Alternatives and similar repositories for IJCAI2023-OptimalShardedDataParallel

Users that are interested in IJCAI2023-OptimalShardedDataParallel are comparing it to the libraries listed below

Sorting:

zhuohan123 / terapipe
☆77Updated 4 years ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆43Updated 3 years ago
HPDL-Group / Merak
☆80Updated 6 months ago
microsoft / SuperScaler
An experimental parallel training platform
☆56Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆91Updated 2 years ago
Relaxed-System-Lab / HexGen
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆31Updated last year
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆119Updated last month
SymbioticLab / Oobleck
A resilient distributed training framework
☆96Updated last year
parasailteam / coconet
☆83Updated 2 years ago
tonyzhao-jt / LLM-PQ
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆35Updated 2 months ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆68Updated 8 months ago
LiuXiaoxuanPKU / GACT-ICML
☆43Updated 3 years ago
Hsword / Hetu
A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …
☆123Updated last year
microsoft / SparTA
☆159Updated last year
AlibabaPAI / DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆76Updated 4 years ago
Distributed-AI / PipeTransformer
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021
☆56Updated 4 years ago
thu-pacman / FasterMoE
☆88Updated 3 years ago
lzhangbv / dear_pytorch
[ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining
☆11Updated last year
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆147Updated 3 years ago
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated 2 years ago
zhaiyi000 / tlp
☆41Updated last year
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆80Updated last year
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆187Updated last month
hao-ai-lab / MuxServe
☆79Updated last month
kungfu-team / tenplex
Dynamic resources changes for multi-dimensional parallelism training
☆29Updated 2 months ago
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
geoffxy / habitat
🔮 Execution time predictions for deep neural network training iterations across different GPUs.
☆62Updated 2 years ago
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆220Updated 3 months ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆223Updated 2 years ago
casys-kaist / EnvPipe
☆25Updated 2 years ago