TheMody / No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation
SaLSa Optimizer implementation (No learning rates needed)
☆28Updated last week
Alternatives and similar repositories for No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation:
Users that are interested in No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation are comparing it to the libraries listed below
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated last month
- ☆41Updated 2 weeks ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆80Updated this week
- ☆78Updated 10 months ago
- ☆29Updated 9 months ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆114Updated 8 months ago
- This repository contains a better implementation of Kolmogorov-Arnold networks☆60Updated 9 months ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆53Updated 3 months ago
- Implementation of the proposed Spline-Based Transformer from Disney Research☆85Updated 3 months ago
- Official implementation of MetaTree: Learning a Decision Tree Algorithm with Transformers☆104Updated 5 months ago
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆57Updated 2 months ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆72Updated 2 weeks ago
- Set of scripts to finetune LLMs☆36Updated 10 months ago
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆114Updated 2 weeks ago
- Collection of tests performed during the study of the new Kolmogorov-Arnold Neural Networks (KAN)☆36Updated 4 months ago
- ☆31Updated 9 months ago
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆107Updated 2 weeks ago
- PyTorch implementation of models from the Zamba2 series.☆176Updated 3 weeks ago
- Interactive Variational Autoencoder (VAE)☆43Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆39Updated 8 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 11 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆75Updated 2 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆119Updated last week
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆82Updated last month
- ☆54Updated 3 months ago
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆41Updated 5 months ago
- This is the official repo for Gradient Agreement Filtering (GAF).☆22Updated 2 weeks ago