microsoft / dionLinks
Dion optimizer algorithm
☆420Updated last week
Alternatives and similar repositories for dion
Users that are interested in dion are comparing it to the libraries listed below
Sorting:
- 🧱 Modula software package☆321Updated 5 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆333Updated 2 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆306Updated last month
- Load compute kernels from the Hub☆381Updated this week
- Normalized Transformer (nGPT)☆197Updated last year
- PyTorch-native post-training at scale☆600Updated this week
- ☆289Updated last year
- Efficient optimizers☆280Updated last month
- MoE training for Me and You and maybe other people☆327Updated 3 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆185Updated last week
- An implementation of PSGD Kron second-order optimizer for PyTorch☆98Updated 6 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆472Updated 2 weeks ago
- Simple MPI implementation for prototyping or learning☆299Updated 5 months ago
- Quantized LLM training in pure CUDA/C++.☆233Updated last week
- Minimal yet performant LLM examples in pure JAX☆233Updated 2 weeks ago
- supporting pytorch FSDP for optimizers☆84Updated last year
- Open-source framework for the research and development of foundation models.☆731Updated this week
- Training API and CLI☆318Updated last week
- Implementation of Diffusion Transformer (DiT) in JAX☆305Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆229Updated 7 months ago
- ☆127Updated last week
- seqax = sequence modeling + JAX☆170Updated 6 months ago
- Scalable and Performant Data Loading☆363Updated last week
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆157Updated 2 months ago
- Supporting code for the blog post on modular manifolds.☆113Updated 4 months ago
- Accelerated First Order Parallel Associative Scan☆194Updated 3 weeks ago
- ☆229Updated 2 months ago
- ☆70Updated last year
- DeMo: Decoupled Momentum Optimization☆198Updated last year
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆818Updated last week