[Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
☆15Feb 12, 2026Updated 2 weeks ago
Alternatives and similar repositories for mu_learned_optimization
Users that are interested in mu_learned_optimization are comparing it to the libraries listed below
Sorting:
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 8 months ago
- ☆24Sep 25, 2024Updated last year
- ☆47Jan 18, 2024Updated 2 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- recipe for training fully-featured self supervised image jepa models☆12Jun 4, 2025Updated 8 months ago
- ☆29Feb 27, 2024Updated 2 years ago
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆51Updated this week
- ☆20May 30, 2024Updated last year
- Implementation for robust ViT and scaled attention☆21Apr 4, 2025Updated 10 months ago
- ☆23Jun 18, 2024Updated last year
- A port of muP to JAX/Haiku☆25Oct 23, 2022Updated 3 years ago
- An efficient implementation of learned optimizers in PyTorch☆37Dec 2, 2025Updated 3 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- Supporting code for the blog post on modular manifolds.☆117Sep 26, 2025Updated 5 months ago
- ☆23Oct 15, 2024Updated last year
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆57Mar 10, 2025Updated 11 months ago
- ☆67Mar 21, 2025Updated 11 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- ☆63Oct 3, 2024Updated last year
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆25Nov 6, 2023Updated 2 years ago
- Masked Structural Growth for 2x Faster Language Model Pre-training☆25Apr 28, 2024Updated last year
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- ☆27May 3, 2024Updated last year
- ☆34Sep 10, 2024Updated last year
- ☆124May 28, 2024Updated last year
- ☆35Apr 12, 2024Updated last year
- BitLinear implementation☆35Jan 1, 2026Updated 2 months ago
- ☆33Nov 4, 2024Updated last year
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆41Jan 29, 2026Updated last month
- AdamW optimizer for bfloat16 models in pytorch 🔥.☆39Jun 16, 2024Updated last year
- Fast and tiny NeRF implementation☆41Dec 5, 2023Updated 2 years ago
- ☆10Aug 18, 2016Updated 9 years ago
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆36Jun 7, 2024Updated last year
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- ☆83Apr 16, 2024Updated last year
- ☆55Updated this week
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year