kwai / Megatron-KwaiLinks

[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism

☆61

Alternatives and similar repositories for Megatron-Kwai

Users that are interested in Megatron-Kwai are comparing it to the libraries listed below

Sorting:

madsys-dev / deepseekv2-profile
☆145Updated 4 months ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆134Updated 2 weeks ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆114Updated this week
LoongServe / LoongServe
☆109Updated 8 months ago
InternLM / Awesome-LLM-Training-System
☆42Updated 11 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆154Updated last month
stepfun-ai / StepMesh
☆24Updated last week
AlibabaPAI / FLASHNN
☆96Updated 10 months ago
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆144Updated 3 years ago
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆253Updated this week
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆411Updated 2 months ago
Victarry / PP-Schedule-Visualization
Pipeline Parallelism Emulation and Visualization
☆54Updated last month
thunlp / Seq1F1B
Sequence-level 1F1B schedule for LLMs.
☆29Updated last month
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆128Updated 6 months ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆405Updated 2 months ago
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆110Updated last year
fzyzcjy / torch_memory_saver
Allow torch tensor memory to be released and resumed later
☆93Updated 3 weeks ago
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆51Updated 5 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆264Updated 4 months ago
lhb8125 / Megatron-LM
Ongoing research training transformer models at scale
☆18Updated 3 weeks ago
OpenPPL / ppl.llm.kernel.cuda
☆149Updated 6 months ago
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆273Updated last year
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆103Updated 2 months ago
zhuohan123 / terapipe
☆75Updated 4 years ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆216Updated last year
OpenPPL / ppl.nn.llm
☆139Updated last year
CalebDu / Awesome-Cute
☆89Updated 2 months ago
LLMServe / DistServe
Disaggregated serving system for Large Language Models (LLMs).
☆654Updated 3 months ago
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆88Updated last month
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆138Updated this week