LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆286Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for distributed-training-guide
- For optimization algorithm research and development.☆449Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆483Updated 3 weeks ago
- ☆292Updated 4 months ago
- Annotated version of the Mamba paper☆457Updated 8 months ago
- Fast bare-bones BPE for modern tokenizer training☆142Updated last month
- UNet diffusion model in pure CUDA☆584Updated 4 months ago
- System 2 Reasoning Link Collection☆693Updated 3 weeks ago
- A comprehensive deep dive into the world of tokens☆214Updated 4 months ago
- ☆133Updated 9 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆252Updated 5 months ago
- code for training & evaluating Contextual Document Embedding models☆117Updated this week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆193Updated this week
- Website for hosting the Open Foundation Models Cheat Sheet.☆257Updated 4 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆229Updated 3 weeks ago
- Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"☆157Updated 9 months ago
- A puzzle to learn about prompting☆121Updated last year
- Helpful tools and examples for working with flex-attention☆469Updated 3 weeks ago
- ☆139Updated 3 months ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆151Updated 7 months ago
- LoRA and DoRA from Scratch Implementations☆188Updated 8 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆715Updated last month
- NanoGPT (124M) quality in 7.8 8xH100-minutes☆1,033Updated this week
- PyTorch implementation of models from the Zamba2 series.☆158Updated this week
- What would you do with 1000 H100s...☆903Updated 10 months ago
- A bibliography and survey of the papers surrounding o1☆754Updated this week
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆173Updated 4 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆172Updated 2 months ago
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines☆195Updated 6 months ago
- ☆129Updated 3 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated last week