TheMody / No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-AdaptationLinks
SaLSa Optimizer implementation (No learning rates needed)
☆30Updated last month
Alternatives and similar repositories for No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation
Users that are interested in No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation are comparing it to the libraries listed below
Sorting:
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆101Updated 6 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆103Updated 2 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆91Updated 2 months ago
- ICLR 2025 - official implementation for "I-Con: A Unifying Framework for Representation Learning"☆96Updated this week
- ☆29Updated last month
- Generative Modeling with Bayesian Sample Inference☆22Updated last month
- Generate graph/data embeddings multiple ways☆54Updated this week
- This repository contains a better implementation of Kolmogorov-Arnold networks☆62Updated 3 weeks ago
- ☆81Updated last year
- ☆31Updated last year
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆118Updated last year
- ☆30Updated 8 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- FlashRNN - Fast RNN Kernels with I/O Awareness☆91Updated 2 weeks ago
- This is the code that went into our practical dive using mamba as information extraction☆53Updated last year
- Implementations of attention with the softpick function, naive and FlashAttention-2☆77Updated last month
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆59Updated 7 months ago
- A HuggingFace compatible Small Language Model trainer.☆75Updated 4 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated 2 months ago
- ☆190Updated 6 months ago
- This is the official repo for Gradient Agreement Filtering (GAF).☆24Updated 4 months ago
- This repository contains the source code for the Saving 77% of the Parameters in Large Language Models Technical Report☆30Updated 3 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated 3 months ago
- ☆31Updated last year
- Focused on fast experimentation and simplicity☆74Updated 6 months ago
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆127Updated 3 weeks ago
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆64Updated 3 weeks ago
- σ-GPT: A New Approach to Autoregressive Models☆65Updated 10 months ago
- ☆98Updated 5 months ago
- ☆47Updated 7 months ago