☆67Mar 21, 2025Updated 11 months ago
Alternatives and similar repositories for Super_Muon
Users that are interested in Super_Muon are comparing it to the libraries listed below
Sorting:
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 9 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆239Jun 15, 2025Updated 8 months ago
- ☆19Dec 4, 2025Updated 3 months ago
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- RWKV-7 mini☆12Mar 29, 2025Updated 11 months ago
- ☆15Mar 2, 2025Updated last year
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆33Jul 1, 2025Updated 8 months ago
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆15Feb 12, 2026Updated 3 weeks ago
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics☆71Jan 13, 2026Updated last month
- H-Net Dynamic Hierarchical Architecture☆81Sep 11, 2025Updated 5 months ago
- DeMo: Decoupled Momentum Optimization☆198Dec 2, 2024Updated last year
- ☆34Sep 10, 2024Updated last year
- Fast modular code to create and train cutting edge LLMs☆68May 16, 2024Updated last year
- ☆20May 30, 2024Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Dec 4, 2024Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- Supporting code for the blog post on modular manifolds.☆117Sep 26, 2025Updated 5 months ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Here we will test various linear attention designs.☆62Apr 25, 2024Updated last year
- ☆44Nov 1, 2025Updated 4 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆36Jun 7, 2024Updated last year
- ☆27Feb 26, 2026Updated last week
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆87Updated this week
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆43Mar 11, 2025Updated 11 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆140Feb 25, 2026Updated last week
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Apr 22, 2025Updated 10 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆110Oct 11, 2025Updated 4 months ago
- RADLADS training code☆37May 7, 2025Updated 9 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)