Code for the paper "Function-Space Learning Rates"
☆25Jun 3, 2025Updated 9 months ago
Alternatives and similar repositories for function-space-learning-rates-paper
Users that are interested in function-space-learning-rates-paper are comparing it to the libraries listed below
Sorting:
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆15Feb 12, 2026Updated 2 weeks ago
- Schedule free optimiser implemented in JAX using Optimistix☆15May 29, 2024Updated last year
- Supporting code for the blog post on modular manifolds.☆117Sep 26, 2025Updated 5 months ago
- ☆67Mar 21, 2025Updated 11 months ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- Code for "What really matters in matrix-whitening optimizers?"☆22Oct 31, 2025Updated 4 months ago
- ☆24Jun 4, 2024Updated last year
- ☆27May 3, 2024Updated last year
- ☆33Oct 4, 2024Updated last year
- ☆13Jun 3, 2024Updated last year
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation☆14Jan 2, 2026Updated 2 months ago
- ☆12Jul 30, 2025Updated 7 months ago
- ☆15Mar 2, 2025Updated last year
- RWKV-7 mini☆12Mar 29, 2025Updated 11 months ago
- recipe for training fully-featured self supervised image jepa models☆12Jun 4, 2025Updated 9 months ago
- [ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models☆47Updated this week
- [WIP] Transformer to embed Danbooru labelsets☆13Mar 31, 2024Updated last year
- nanoGPT using Equinox☆15Mar 3, 2023Updated 3 years ago
- ☆34Sep 10, 2024Updated last year
- ☆63Oct 3, 2024Updated last year
- ☆33Nov 4, 2024Updated last year
- Pokedex for LLMs☆14Apr 14, 2025Updated 10 months ago
- Estimate resources needed to train LLMs☆14Feb 10, 2026Updated 3 weeks ago
- train with kittens!☆63Oct 25, 2024Updated last year
- Unit Scaling demo and experimentation code☆16Mar 12, 2024Updated last year
- Code for implementing central flows☆43Sep 5, 2025Updated 5 months ago
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆36Jun 7, 2024Updated last year
- WIP☆94Aug 13, 2024Updated last year
- Computing the greatest common divisor with transformers, source code for the paper https//arxiv.org/abs/2308.15594☆14Aug 11, 2025Updated 6 months ago
- ☆24Dec 11, 2024Updated last year
- ☆55Feb 24, 2026Updated last week
- ☆18Aug 24, 2024Updated last year
- Scaling Sparse Fine-Tuning to Large Language Models☆18Jan 31, 2024Updated 2 years ago
- Synthetic Alphabet Dataset☆19Mar 27, 2025Updated 11 months ago
- ☆20May 30, 2024Updated last year
- WeGeFT: Weight‑Generative Fine ‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models☆22Jul 10, 2025Updated 7 months ago
- ☆40Jan 5, 2024Updated 2 years ago