epfml/llm-optimizer-benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/epfml/llm-optimizer-benchmark)

epfml / llm-optimizer-benchmark

Benchmarking Optimizers for LLM Pretraining

☆60

Alternatives and similar repositories for llm-optimizer-benchmark

Users that are interested in llm-optimizer-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tianshijing / ScalingOpt
View on GitHub
ScalingOpt - Optimization Community
☆104Jun 1, 2026Updated last month
LIONS-EPFL / scion
View on GitHub
☆70Apr 8, 2026Updated 3 months ago
OptimAI-Lab / Minimalist_LLM_Pretraining
View on GitHub
[ICML 2026] Memory-Efficient LLM Pretraining via Minimalist Optimizer Design
☆21May 26, 2026Updated last month
Unakar / Spectral-Sphere-Optimizer
View on GitHub
Spectral Sphere Optimizer
☆130Mar 23, 2026Updated 3 months ago
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
epfml / llm-baselines
View on GitHub
nanoGPT-like codebase for LLM training
☆118Nov 7, 2025Updated 8 months ago
nikhilvyas / SOAP_MUON
View on GitHub
Combining SOAP and MUON
☆22Feb 11, 2025Updated last year
zhehangdu / Newton-Muon
View on GitHub
The Newton-Muon optimizer
☆30Jun 5, 2026Updated last month
xie-lab-ml / Mano-Restriking-Manifold-Optimization-for-LLM-Training
View on GitHub
The official code of "Mano: Restriking Manifold Optimization for LLM Training".
☆25Jun 1, 2026Updated last month
kvfrans / matrix-whitening
View on GitHub
Code for "What really matters in matrix-whitening optimizers?"
☆25Oct 31, 2025Updated 8 months ago
nikhilvyas / SOAP
View on GitHub
☆273Dec 2, 2024Updated last year
tilde-research / aurora-release
View on GitHub
Aurora optimizer release
☆150Updated this week
nanduruganesh / flash-msa
View on GitHub
☆22Jul 13, 2026Updated last week
zqOuO / GWT
View on GitHub
☆13May 4, 2026Updated 2 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Gunale0926 / Grams
View on GitHub
Grams: Gradient Descent with Adaptive Momentum Scaling (ICLR 2025 Workshop)
☆17Mar 6, 2025Updated last year
Dao-AILab / gram-newton-schulz
View on GitHub
Fast Polar Decomposition for Muon
☆166Jul 2, 2026Updated 2 weeks ago
EleutherAI / nanoGPT-mup
View on GitHub
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆199Jan 19, 2026Updated 6 months ago
fjzzq2002 / WeightWatch
View on GitHub
Official Repository of Paper "Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs"
☆15Sep 25, 2025Updated 9 months ago
fiveai / understanding_safety_finetuning
View on GitHub
Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)
☆12Oct 31, 2024Updated last year
timmytonga / sn-sm
View on GitHub
Subset-Norm and Subset-Momentum. This repo is built on top of https://github.com/jiaweizzhao/GaLore.
☆19Jul 9, 2025Updated last year
Sphere-AI-Lab / PEFT-Arena
View on GitHub
Official repository of PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective
☆26Jun 13, 2026Updated last month
baekrok / DASH-Direction-Aware-SHrinking
View on GitHub
☆14Dec 13, 2024Updated last year
zhentingqi / evolm
View on GitHub
☆75Jun 23, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
bneyshabur / generalization-bounds
View on GitHub
Computing various measures and generalization bounds on convolutional and fully connected networks
☆35Dec 13, 2018Updated 7 years ago
Model-GLUE / Model-GLUE
View on GitHub
☆18Aug 19, 2024Updated last year
anh-tong / nanoGPT-equinox
View on GitHub
nanoGPT using Equinox
☆15Mar 3, 2023Updated 3 years ago
chengxiang / LinearTransformer
View on GitHub
Pytorch code for experiments on Linear Transformers
☆24Jan 12, 2024Updated 2 years ago
Sphere-AI-Lab / poet
View on GitHub
Implementation for POET and POET-X for LLM pretraining
☆38Jun 9, 2026Updated last month
PrimeIntellect-ai / diloco_simple
View on GitHub
torch implementation of diloco
☆24Updated this week
nanowell / AdEMAMix-Optimizer-Pytorch
View on GitHub
The AdEMAMix Optimizer: Better, Faster, Older.
☆188Sep 12, 2024Updated last year
IST-DASLab / CrAM
View on GitHub
Code for reproducing the results from "CrAM: A Compression-Aware Minimizer" accepted at ICLR 2023
☆10Mar 1, 2023Updated 3 years ago
microsoft / accbpg
View on GitHub
Accelerated Bregman Proximal Gradient Methods
☆29Jun 12, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
daeveraert / gradient-information-optimization
View on GitHub
Implementation of Gradient Information Optimization (GIO) for effective and scalable training data selection
☆14Jun 22, 2023Updated 3 years ago
shikaiqiu / compute-better-spent
View on GitHub
☆63Oct 3, 2024Updated last year
TsinghuaC3I / SoRA
View on GitHub
[EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models
☆87Mar 5, 2024Updated 2 years ago
aadityasingh / icl-dynamics
View on GitHub
☆26Feb 20, 2026Updated 5 months ago
Kernel-Machines / kermac
View on GitHub
Pytorch routines for (Ker)nel (Mac)hines
☆12Oct 10, 2025Updated 9 months ago
APRIL-AIGC / awesome-optimizer
View on GitHub
Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations
☆31Jul 2, 2026Updated 2 weeks ago
kyleliang919 / C-Optim
View on GitHub
[ICLR 2026] When it comes to optimizers, it's always better to be safe than sorry
☆417Sep 26, 2025Updated 9 months ago