TheMody / No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-AdaptationLinks
SaLSa Optimizer implementation (No learning rates needed)
☆30Updated 2 weeks ago
Alternatives and similar repositories for No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation
Users that are interested in No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation are comparing it to the libraries listed below
Sorting:
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆100Updated 5 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆91Updated 2 months ago
- ICLR 2025 - official implementation for "I-Con: A Unifying Framework for Representation Learning"☆91Updated 3 weeks ago
- ☆80Updated last year
- ☆45Updated 4 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆98Updated last month
- FlashRNN - Fast RNN Kernels with I/O Awareness☆90Updated 2 months ago
- ☆150Updated 9 months ago
- ☆29Updated last month
- This is the code that went into our practical dive using mamba as information extraction☆53Updated last year
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆117Updated last year
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 8 months ago
- A HuggingFace compatible Small Language Model trainer.☆75Updated 4 months ago
- ☆95Updated 4 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆139Updated 2 weeks ago
- Generate graph/data embeddings multiple ways☆53Updated last week
- Induce brain-like topographic structure in your neural networks☆62Updated 2 weeks ago
- ☆71Updated 9 months ago
- Focused on fast experimentation and simplicity☆73Updated 5 months ago
- ☆61Updated 6 months ago
- ☆31Updated last year
- A State-Space Model with Rational Transfer Function Representation.☆78Updated last year
- ☆30Updated 7 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆183Updated 8 months ago
- This repository contains a better implementation of Kolmogorov-Arnold networks☆61Updated this week
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆83Updated 3 months ago
- Generative Modeling with Bayesian Sample Inference☆22Updated 2 weeks ago
- lossily compress representation vectors using product quantization☆54Updated last month
- Explorations into whether a transformer with RL can direct a genetic algorithm to converge faster☆70Updated 2 weeks ago
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆63Updated last month