microsoft / dionLinks
Dion optimizer algorithm
β361Updated last week
Alternatives and similar repositories for dion
Users that are interested in dion are comparing it to the libraries listed below
Sorting:
- π§± Modula software packageβ282Updated last month
- β282Updated last year
- Simple & Scalable Pretraining for Neural Architecture Researchβ296Updated last month
- PyTorch Single Controllerβ435Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β164Updated 3 months ago
- supporting pytorch FSDP for optimizersβ83Updated 10 months ago
- Load compute kernels from the Hubβ293Updated last week
- Efficient optimizersβ265Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β290Updated 2 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorchβ95Updated 2 months ago
- β222Updated last week
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsβ305Updated 2 months ago
- Normalized Transformer (nGPT)β190Updated 10 months ago
- DeMo: Decoupled Momentum Optimizationβ192Updated 10 months ago
- Minimal yet performant LLM examples in pure JAXβ181Updated 2 weeks ago
- πSmall Batch Size Training for Language Modelsβ63Updated last week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ650Updated this week
- β91Updated last year
- β264Updated this week
- Implementation of Diffusion Transformer (DiT) in JAXβ292Updated last year
- seqax = sequence modeling + JAXβ167Updated 2 months ago
- Open-source framework for the research and development of foundation models.β466Updated last week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β247Updated 8 months ago
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.β95Updated 2 weeks ago
- SIMD quantization kernelsβ87Updated last month
- Getting crystal-like representations with harmonic lossβ194Updated 6 months ago
- β173Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"β85Updated last month
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ189Updated 3 months ago
- β811Updated last week