LIONS-EPFL/scion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LIONS-EPFL/scion)

LIONS-EPFL / scion

☆70

Alternatives and similar repositories for scion

Users that are interested in scion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

damek / specgd
View on GitHub
Code to generate figures of paper "When do spectral gradient updates help in deep learning?"
☆16Dec 3, 2025Updated 7 months ago
nikhilvyas / SOAP_MUON
View on GitHub
Combining SOAP and MUON
☆23Feb 11, 2025Updated last year
SDLAML / disco
View on GitHub
☆16Dec 11, 2025Updated 7 months ago
nikhilvyas / SOAP
View on GitHub
☆275Dec 2, 2024Updated last year
NoahAmsel / PolarExpress
View on GitHub
☆33Jul 6, 2026Updated 2 weeks ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
xie-lab-ml / Mano-Restriking-Manifold-Optimization-for-LLM-Training
View on GitHub
The official code of "Mano: Restriking Manifold Optimization for LLM Training".
☆25Jun 1, 2026Updated last month
Unakar / Spectral-Sphere-Optimizer
View on GitHub
Spectral Sphere Optimizer
☆130Mar 23, 2026Updated 4 months ago
OptimAI-Lab / Minimalist_LLM_Pretraining
View on GitHub
[ICML 2026] Memory-Efficient LLM Pretraining via Minimalist Optimizer Design
☆22May 26, 2026Updated last month
Farseer-Scaling-Law / Farseer
View on GitHub
☆21Jun 12, 2025Updated last year
epfml / llm-optimizer-benchmark
View on GitHub
Benchmarking Optimizers for LLM Pretraining
☆60May 3, 2026Updated 2 months ago
thib-s / flash-newton-schulz
View on GitHub
My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.
☆38Apr 30, 2026Updated 2 months ago
apple / ml-ademamix
View on GitHub
☆71Nov 15, 2024Updated last year
kyleliang919 / Super_Muon
View on GitHub
☆68Mar 21, 2025Updated last year
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
modula-systems / modula
View on GitHub
🧱 Modula software package
☆337Aug 18, 2025Updated 11 months ago
Arongil / lipschitz-transformers
View on GitHub
Don't just regulate gradients like in Muon, regulate the weights too
☆32Jul 30, 2025Updated 11 months ago
kvfrans / matrix-whitening
View on GitHub
Code for "What really matters in matrix-whitening optimizers?"
☆25Oct 31, 2025Updated 8 months ago
vectozavr / llm-hessian
View on GitHub
Using PyTorch autograd to compute Hessian of Perplexity for Large Language Models
☆29Apr 17, 2025Updated last year
NVIDIA-NeMo / Emerging-Optimizers
View on GitHub
☆215Updated this week
zyushun / hessian-spectrum
View on GitHub
Code for the paper: Why Transformers Need Adam: A Hessian Perspective
☆65Mar 11, 2025Updated last year
zyushun / Adam-mini
View on GitHub
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆457May 13, 2025Updated last year
thinking-machines-lab / manifolds
View on GitHub
Supporting code for the blog post on modular manifolds.
☆126Sep 26, 2025Updated 9 months ago
microsoft / dion
View on GitHub
Dion optimizer algorithm
☆496Jul 12, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
CongWeilin / DGCN
View on GitHub
☆10Aug 13, 2021Updated 4 years ago
EleutherAI / nanoGPT-mup
View on GitHub
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆199Jan 19, 2026Updated 6 months ago
Dao-AILab / gram-newton-schulz
View on GitHub
Fast Polar Decomposition for Muon
☆167Jul 2, 2026Updated 3 weeks ago
deep-spin / adasplash
View on GitHub
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆46May 20, 2026Updated 2 months ago
wz1119 / KromHC
View on GitHub
[ICML 2026] Implementation for KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices
☆15Jul 13, 2026Updated last week
aHapBean / xHC
View on GitHub
[Tech Report] Expanded Hyper-Connections
☆49Updated this week
allenai / signal-and-noise
View on GitHub
Measuring the Signal to Noise Ratio in Language Model Evaluation
☆31Aug 19, 2025Updated 11 months ago
leloykun / adaptive-muon
View on GitHub
A single-line modification to any (dualizer-based) optimizer that allows the optimizer to adapt to the scale of the gradients as they cha…
☆19Jan 11, 2025Updated last year
mekty2012 / Deep-Learning-Theory
View on GitHub
Repository for Deep Learning Theory papers
☆15Jan 24, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆258Jun 15, 2025Updated last year
wdlctc / delta-attention-residuals-code
View on GitHub
Delta Attention Residuals - supplementary code and pretrained models
☆40May 20, 2026Updated 2 months ago
bentherien / mu_learned_optimization
View on GitHub
[Poster; ICLR 2026] [Oral; Neurips OPT2024] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
☆16Apr 15, 2026Updated 3 months ago
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 7 months ago
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆93Oct 30, 2024Updated last year
KellerJordan / Muon
View on GitHub
Muon is an optimizer for hidden layers in neural networks
☆2,731May 24, 2026Updated 2 months ago
MoonshotAI / Moonlight
View on GitHub
Muon is Scalable for LLM Training
☆1,510Aug 3, 2025Updated 11 months ago