nikhilvyas/SOAP_MUON

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nikhilvyas/SOAP_MUON)

nikhilvyas / SOAP_MUON

Combining SOAP and MUON

☆25

Alternatives and similar repositories for SOAP_MUON

Users that are interested in SOAP_MUON are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lliu606 / COSMOS
View on GitHub
☆20Feb 2, 2026Updated 5 months ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
LIONS-EPFL / scion
View on GitHub
☆70Apr 8, 2026Updated 3 months ago
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
watcl-lab / positional_attention
View on GitHub
Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"
☆14May 26, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
nikhilvyas / SOAP
View on GitHub
☆275Dec 2, 2024Updated last year
NVIDIA-NeMo / Emerging-Optimizers
View on GitHub
☆226Updated this week
damek / specgd
View on GitHub
Code to generate figures of paper "When do spectral gradient updates help in deep learning?"
☆16Dec 3, 2025Updated 7 months ago
kazuki-irie / kv-memory-brain
View on GitHub
Official Code Repository for the paper "Key-value memory in the brain"
☆32Feb 25, 2025Updated last year
Unakar / Spectral-Sphere-Optimizer
View on GitHub
Spectral Sphere Optimizer
☆131Mar 23, 2026Updated 4 months ago
shawntan / SUT
View on GitHub
Repository for Sparse Universal Transformers
☆20Oct 23, 2023Updated 2 years ago
RadicalNumerics / spear
View on GitHub
Structured Primitives for Efficient Architecture Research
☆20Dec 22, 2025Updated 7 months ago
ULTRA-HOI / HOI4-ULTRA-Project
View on GitHub
Github Repository for the HOI4 ULTRA Project.
☆11Updated this week
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 7 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
dmis-lab / Outlier-Safe-Pre-Training
View on GitHub
[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
☆39Nov 4, 2025Updated 8 months ago
IDSIA / automated-cl
View on GitHub
Official repository for the paper "Automating Continual Learning"
☆20Jun 11, 2025Updated last year
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
ali-vilab / matrix
View on GitHub
☆34Apr 8, 2025Updated last year
Dhanush123 / cs182
View on GitHub
Deep Neural Network course @ UC Berkeley, taken Spring 2019
☆10Dec 7, 2022Updated 3 years ago
yuezhouhu / 2by4-pretrain
View on GitHub
Efficient 2:4 sparse training algorithms and implementations
☆62Dec 8, 2024Updated last year
apple / ml-ademamix
View on GitHub
☆71Nov 15, 2024Updated last year
NoahAmsel / PolarExpress
View on GitHub
☆33Jul 6, 2026Updated 2 weeks ago
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
Farseer-Scaling-Law / Farseer
View on GitHub
☆21Jun 12, 2025Updated last year
Eliyas0007 / Pytorch-Intention
View on GitHub
Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention
☆12May 24, 2023Updated 3 years ago
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated 2 years ago
berndprach / AOL
View on GitHub
Code for paper Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks
☆13Aug 9, 2022Updated 3 years ago
IDSIA / rtrl-elstm
View on GitHub
Official repository for the paper "Exploring the Promise and Limits of Real-Time Recurrent Learning" (ICLR 2024)
☆13Jun 11, 2025Updated last year
drbh / yamoe
View on GitHub
🔀 yet another mixture of experts
☆23Jun 5, 2026Updated last month
berlino / seq_icl
View on GitHub
☆54May 20, 2024Updated 2 years ago
ml-research / cna_modules
View on GitHub
Cluster-Normalize-Activate Modules
☆13Jan 13, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
kyleliang919 / Super_Muon
View on GitHub
☆68Mar 21, 2025Updated last year
Triang-jyed-driung / rwkv7mini
View on GitHub
RWKV-7 mini
☆12Mar 29, 2025Updated last year
qsh-zh / gDDIM
View on GitHub
[ICLR'23 Spotlight] gDDIM: analyze and accelerate general diffusion models, isotropic and non-isotropic
☆49Oct 21, 2023Updated 2 years ago
thib-s / flash-newton-schulz
View on GitHub
My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.
☆38Apr 30, 2026Updated 2 months ago
cofe-ai / Mu-scaling
View on GitHub
Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales
☆32Jul 17, 2023Updated 3 years ago
stephenkyang / mean-reversion-pairs-trading
View on GitHub
manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices
☆11Jan 12, 2021Updated 5 years ago
BenjaminAster / WebGPU-Mandelbrot
View on GitHub
A GPU accelerated Mandelbrot viewer made using the new WebGPU API.
☆10Oct 26, 2023Updated 2 years ago