ConnollyLeon / awesome-Auto-ParallelismLinks

A baseline repository of Auto-Parallelism in Training Neural Networks

☆144

Alternatives and similar repositories for awesome-Auto-Parallelism

Users that are interested in awesome-Auto-Parallelism are comparing it to the libraries listed below

Sorting:

microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆114Updated 3 weeks ago
parasailteam / coconet
☆80Updated 2 years ago
HPDL-Group / Merak
☆80Updated 2 months ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆67Updated 4 months ago
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆83Updated 2 years ago
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆136Updated last week
mutinifni / splitwise-sim
LLM serving cluster simulator
☆108Updated last year
lambda7xx / awesome-AI-system
paper and its code for AI System
☆317Updated 3 months ago
Shenggan / awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
☆239Updated 9 months ago
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆189Updated last week
kwai / Megatron-Kwai
[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…
☆60Updated last year
zhuohan123 / terapipe
☆75Updated 4 years ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆263Updated 4 months ago
ColfaxResearch / cfx-article-src
☆126Updated 2 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆154Updated last month
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆128Updated 6 months ago
calculon-ai / calculon
☆145Updated last year
microsoft / msccl
Microsoft Collective Communication Library
☆352Updated last year
AlibabaPAI / DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆77Updated 4 years ago
microsoft / msccl-tools
Synthesizer for optimal collective communication algorithms
☆110Updated last year
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆142Updated 6 months ago
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated last year
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆79Updated 8 months ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆405Updated 2 months ago
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆368Updated 10 months ago
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆60Updated last year
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆62Updated last year
zhaiyi000 / tlm
☆42Updated last year
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆96Updated 2 years ago
LoongServe / LoongServe
☆109Updated 8 months ago