microsoft / dionLinks
Dion optimizer algorithm
☆424Updated 3 weeks ago
Alternatives and similar repositories for dion
Users that are interested in dion are comparing it to the libraries listed below
Sorting:
- 🧱 Modula software package☆322Updated 5 months ago
- MoE training for Me and You and maybe other people☆335Updated last month
- PyTorch-native post-training at scale☆613Updated this week
- Load compute kernels from the Hub☆389Updated last week
- Simple & Scalable Pretraining for Neural Architecture Research☆307Updated 2 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆334Updated 3 months ago
- ☆289Updated last year
- Normalized Transformer (nGPT)☆198Updated last year
- ☆147Updated this week
- Efficient optimizers☆281Updated last month
- Minimal yet performant LLM examples in pure JAX☆236Updated 3 weeks ago
- Supporting code for the blog post on modular manifolds.☆115Updated 4 months ago
- torchax is a PyTorch frontend for JAX. It gives JAX the ability to author JAX programs using familiar PyTorch syntax. It also provides JA…☆175Updated this week
- supporting pytorch FSDP for optimizers☆84Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆186Updated 2 weeks ago
- Open-source framework for the research and development of foundation models.☆752Updated this week
- Quantized LLM training in pure CUDA/C++.☆235Updated 2 weeks ago
- Training API and CLI☆325Updated last week
- Simple MPI implementation for prototyping or learning☆300Updated 6 months ago
- ☆70Updated last year
- ☆957Updated 3 months ago
- ☆232Updated 2 months ago
- seqax = sequence modeling + JAX☆170Updated 6 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆352Updated 2 months ago
- ☆92Updated last year
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆157Updated 2 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆475Updated this week
- ☆304Updated last week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆827Updated last week
- Scalable and Performant Data Loading☆364Updated this week