LambdaLabsML / distributed-training-guideLinks

Best practices & guides on how to write distributed pytorch training code

☆517

Alternatives and similar repositories for distributed-training-guide

Users that are interested in distributed-training-guide are comparing it to the libraries listed below

Sorting:

huggingface / picotron_tutorial
☆224Updated last week
EleutherAI / cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
☆820Updated 2 months ago
rwitten / HighPerfLLMs2024
☆542Updated last year
srush / LLM-Training-Puzzles
What would you do with 1000 H100s...
☆1,118Updated last year
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆202Updated 7 months ago
Quentin-Anthony / torch-profiling-tutorial
☆510Updated 2 months ago
HazyResearch / aisys-building-blocks
Building blocks for foundation models.
☆567Updated last year
srush / annotated-mamba
Annotated version of the Mamba paper
☆489Updated last year
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆193Updated 4 months ago
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆421Updated 7 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆300Updated 2 months ago
facebookresearch / optimizers
For optimization algorithm research and development.
☆543Updated last week
McGill-NLP / nano-aha-moment
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
☆538Updated 3 weeks ago
huggingface / picotron
Minimalistic 4D-parallelism distributed training framework for education purpose
☆1,863Updated 2 months ago
marin-community / marin
Open-source framework for the research and development of foundation models.
☆532Updated this week
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆580Updated 2 months ago
gpu-mode / profiling-cuda-in-torch
☆174Updated last year
meta-pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆436Updated this week
srush / Transformer-Puzzles
Puzzles for exploring transformers
☆373Updated 2 years ago
jax-ml / scaling-book
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
☆658Updated last week
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆270Updated 3 months ago
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆542Updated 5 months ago
hkproj / triton-flash-attention
☆209Updated 9 months ago
srush / Autodiff-Puzzles
☆456Updated last year
meta-pytorch / attention-gym
Helpful tools and examples for working with flex-attention
☆1,029Updated this week
microsoft / dion
Dion optimizer algorithm
☆369Updated 3 weeks ago
huggingface / kernels
Load compute kernels from the Hub
☆304Updated last week
thinking-machines-lab / batch_invariant_ops
☆843Updated 2 weeks ago
mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆333Updated last year
facebookresearch / spdl
Scalable and Performant Data Loading
☆311Updated last week