edwardmilsom/function-space-learning-rates-paper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/edwardmilsom/function-space-learning-rates-paper)

edwardmilsom / function-space-learning-rates-paper

Code for the paper "Function-Space Learning Rates"

☆23

Alternatives and similar repositories for function-space-learning-rates-paper

Users that are interested in function-space-learning-rates-paper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kyleliang919 / Super_Muon
View on GitHub
☆68Mar 21, 2025Updated last year
Laz4rz / mup
View on GitHub
Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation
☆14Jan 2, 2026Updated 6 months ago
nikhilvyas / SOAP_MUON
View on GitHub
Combining SOAP and MUON
☆22Feb 11, 2025Updated last year
bentherien / mu_learned_optimization
View on GitHub
[Poster; ICLR 2026] [Oral; Neurips OPT2024] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
☆16Apr 15, 2026Updated 3 months ago
cloneofsimo / project_RF
View on GitHub
☆24Jun 4, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
kyrie-23 / linear_task_arithmetic
View on GitHub
☆12Jul 30, 2025Updated 11 months ago
Arongil / lipschitz-transformers
View on GitHub
Don't just regulate gradients like in Muon, regulate the weights too
☆32Jul 30, 2025Updated 11 months ago
johnryan465 / pscan
View on GitHub
☆40Jan 5, 2024Updated 2 years ago
centralflows / centralflows
View on GitHub
Code for implementing central flows
☆48Sep 5, 2025Updated 10 months ago
thinking-machines-lab / manifolds
View on GitHub
Supporting code for the blog post on modular manifolds.
☆126Sep 26, 2025Updated 9 months ago
SDLAML / disco
View on GitHub
☆16Dec 11, 2025Updated 7 months ago
cloneofsimo / min-max-in-dit
View on GitHub
☆27May 3, 2024Updated 2 years ago
timweiland / gp-fvm
View on GitHub
Probabilistic Finite Volume Method based on Affine Gaussian Process inference
☆11Jun 10, 2024Updated 2 years ago
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
cloneofsimo / scaling-guide
View on GitHub
WIP
☆96Aug 13, 2024Updated last year
microsoft / mutransformers
View on GitHub
some common Huggingface transformers in maximal update parametrization (µP)
☆88Mar 14, 2022Updated 4 years ago
berndprach / AOL
View on GitHub
Code for paper Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks
☆13Aug 9, 2022Updated 3 years ago
shikaiqiu / compute-better-spent
View on GitHub
☆63Oct 3, 2024Updated last year
anh-tong / nanoGPT-equinox
View on GitHub
nanoGPT using Equinox
☆15Mar 3, 2023Updated 3 years ago
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 7 months ago
cloneofsimo / ptar
View on GitHub
☆13Jun 3, 2024Updated 2 years ago
spacebel / geozarr-openlayers
View on GitHub
GeoZarr extension for OpenLayers
☆12Jun 27, 2024Updated 2 years ago
AndPotap / einsum-search
View on GitHub
☆34Oct 4, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
sramshetty / mixture-of-depths
View on GitHub
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆35Jun 7, 2024Updated 2 years ago
NoahAmsel / PolarExpress
View on GitHub
☆33Jul 6, 2026Updated 2 weeks ago
graphcore-research / jax-scalify
View on GitHub
JAX Scalify: end-to-end scaled arithmetics
☆18Oct 30, 2024Updated last year
thib-s / flash-newton-schulz
View on GitHub
My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.
☆38Apr 30, 2026Updated 2 months ago
graphcore-research / unit-scaling
View on GitHub
A library for unit scaling in PyTorch
☆135Jul 11, 2025Updated last year
HazyResearch / train-tk
View on GitHub
train with kittens!
☆67Oct 25, 2024Updated last year
LIONS-EPFL / scion
View on GitHub
☆70Apr 8, 2026Updated 3 months ago
stephenkyang / mean-reversion-pairs-trading
View on GitHub
manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices
☆11Jan 12, 2021Updated 5 years ago
cloneofsimo / ezmup
View on GitHub
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆88Jul 28, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
attentionmech / dex
View on GitHub
Pokedex for LLMs
☆14Apr 14, 2025Updated last year
flukeskywalker / nanoDD
View on GitHub
Simple Scalable Discrete Diffusion for text in PyTorch
☆37Sep 27, 2024Updated last year
facebookresearch / GCD
View on GitHub
Computing the greatest common divisor with transformers, source code for the paper https//arxiv.org/abs/2308.15594
☆14Aug 11, 2025Updated 11 months ago
fal-ai-community / alphabet-dataset
View on GitHub
Synthetic Alphabet Dataset
☆19Mar 27, 2025Updated last year
cloneofsimo / zeroshampoo
View on GitHub
☆33Sep 10, 2024Updated last year
GallagherCommaJack / modulax
View on GitHub
☆18Aug 24, 2024Updated last year
hal-314 / fastai-batch-size-finder
View on GitHub
Implementation of OpenAI paper with Simple Noise Scale on Fastai V2
☆19Apr 16, 2021Updated 5 years ago