microsoft / dionLinks

Dion optimizer algorithm

☆384

Alternatives and similar repositories for dion

Users that are interested in dion are comparing it to the libraries listed below

Sorting:

modula-systems / modula
🧱 Modula software package
☆303Updated 3 months ago
google-deepmind / nanodo
☆285Updated last year
KellerJordan / cifar10-airbench
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
☆326Updated last week
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆303Updated 2 weeks ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆300Updated 3 weeks ago
HomebrewML / HeavyBall
Efficient optimizers
☆275Updated last week
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆173Updated 4 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 11 months ago
kvfrans / jax-diffusion-transformer
Implementation of Diffusion Transformer (DiT) in JAX
☆296Updated last year
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆102Updated last month
meta-pytorch / torchforge
PyTorch-native post-training at scale
☆532Updated this week
huggingface / kernels
Load compute kernels from the Hub
☆327Updated last week
cloneofsimo / min-fsdp
☆91Updated last year
Quentin-Anthony / torch-profiling-tutorial
☆528Updated 3 months ago
facebookresearch / optimizers
For optimization algorithm research and development.
☆547Updated this week
facebookresearch / spdl
Scalable and Performant Data Loading
☆335Updated this week
jax-ml / scaling-book
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
☆691Updated this week
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 4 months ago
nikhilvyas / SOAP
☆223Updated 11 months ago
marin-community / marin
Open-source framework for the research and development of foundation models.
☆611Updated last week
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆199Updated 2 months ago
KindXiaoming / grow-crystals
Getting crystal-like representations with harmonic loss
☆192Updated 7 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
apple / ml-ademamix
☆68Updated last year
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆133Updated 11 months ago
huggingface / picotron_tutorial
☆225Updated last month
meta-pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆452Updated last week
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆216Updated this week
iliao2345 / CompressARC
☆200Updated 3 months ago