LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆342Updated this week
Alternatives and similar repositories for distributed-training-guide:
Users that are interested in distributed-training-guide are comparing it to the libraries listed below
- Minimalistic 4D-parallelism distributed training framework for education purpose☆670Updated this week
- For optimization algorithm research and development.☆486Updated last week
- ☆121Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆511Updated this week
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆220Updated 3 weeks ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆171Updated this week
- Building blocks for foundation models.☆440Updated last year
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆216Updated this week
- Scalable and Performant Data Loading☆210Updated this week
- UNet diffusion model in pure CUDA☆596Updated 7 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆288Updated last month
- System 2 Reasoning Link Collection☆751Updated this week
- ☆140Updated 11 months ago
- ☆296Updated 7 months ago
- What would you do with 1000 H100s...☆970Updated last year
- Normalized Transformer (nGPT)☆146Updated 2 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆261Updated 7 months ago
- Annotated version of the Mamba paper☆470Updated 11 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆383Updated last month
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆758Updated last week
- Efficient LLM Inference over Long Sequences☆349Updated last month
- Helpful tools and examples for working with flex-attention☆603Updated this week
- An Open Source Toolkit For LLM Distillation☆442Updated 3 weeks ago
- PyTorch per step fault tolerance (actively under development)☆226Updated this week
- LLM KV cache compression made easy☆356Updated this week
- ☆110Updated 3 weeks ago
- Training Large Language Model to Reason in a Continuous Latent Space☆746Updated this week
- code for training & evaluating Contextual Document Embedding models☆166Updated 2 weeks ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆498Updated 3 months ago
- Textbook on reinforcement learning from human feedback☆154Updated this week