microsoft / dionLinks
Dion optimizer algorithm
โ384Updated this week
Alternatives and similar repositories for dion
Users that are interested in dion are comparing it to the libraries listed below
Sorting:
- ๐งฑ Modula software packageโ303Updated 3 months ago
- โ285Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsโ326Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.โ303Updated 2 weeks ago
- Simple & Scalable Pretraining for Neural Architecture Researchโ300Updated 3 weeks ago
- Efficient optimizersโ275Updated last week
- Normalized Transformer (nGPT)โ192Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ173Updated 4 months ago
- supporting pytorch FSDP for optimizersโ84Updated 11 months ago
- Implementation of Diffusion Transformer (DiT) in JAXโ296Updated last year
- Supporting code for the blog post on modular manifolds.โ102Updated last month
- PyTorch-native post-training at scaleโ532Updated this week
- Load compute kernels from the Hubโ327Updated last week
- โ91Updated last year
- โ528Updated 3 months ago
- For optimization algorithm research and development.โ547Updated this week
- Scalable and Performant Data Loadingโ335Updated this week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsโ691Updated this week
- seqax = sequence modeling + JAXโ168Updated 4 months ago
- โ223Updated 11 months ago
- Open-source framework for the research and development of foundation models.โ611Updated last week
- Minimal yet performant LLM examples in pure JAXโ199Updated 2 months ago
- Getting crystal-like representations with harmonic lossโ192Updated 7 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ132Updated last year
- โ68Updated last year
- A MAD laboratory to improve AI architecture designs ๐งชโ133Updated 11 months ago
- โ225Updated last month
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)โ452Updated last week
- Quantized LLM training in pure CUDA/C++.โ216Updated this week
- โ200Updated 3 months ago