TheMody / No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation
SaLSa Optimizer implementation (No learning rates needed)
☆28Updated 2 months ago
Alternatives and similar repositories for No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation:
Users that are interested in No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation are comparing it to the libraries listed below
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated 3 weeks ago
- A HuggingFace compatible Small Language Model trainer.☆74Updated 3 months ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆46Updated 2 months ago
- A State-Space Model with Rational Transfer Function Representation.☆76Updated 8 months ago
- ☆78Updated 9 months ago
- PyTorch implementation of models from the Zamba2 series.☆168Updated last month
- ☆49Updated 4 months ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆31Updated this week
- FlashRNN - Fast RNN Kernels with I/O Awareness☆69Updated last month
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆113Updated 7 months ago
- This repository contains a better implementation of Kolmogorov-Arnold networks☆59Updated 8 months ago
- NLP with Rust for Python 🦀🐍☆60Updated 7 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 3 months ago
- ☆30Updated 3 months ago
- Gradient Boosting Reinforcement Learning (GBRL)☆95Updated last month
- ☆34Updated 4 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆81Updated 3 weeks ago
- ☆49Updated 10 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆65Updated 8 months ago
- Collection of autoregressive model implementation☆76Updated last week
- ☆58Updated 10 months ago
- ☆52Updated 2 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆178Updated 4 months ago
- Implementation of the proposed Spline-Based Transformer from Disney Research☆85Updated 2 months ago
- Set of scripts to finetune LLMs☆36Updated 9 months ago
- ☆149Updated 5 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆98Updated last month
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆14Updated 2 weeks ago
- ☆146Updated last month