TheMody / No-learning-rates-needed-Introducing-SALSA-Stable-Armijo-Line-Search-Adaptation
SaLSa Optimzer implementation (No learning rates needed)
☆27Updated last week
Related projects: ⓘ
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆82Updated 3 weeks ago
- This repository contains a better implementation of Kolmogorov-Arnold networks☆59Updated 4 months ago
- ☆73Updated 5 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆153Updated last week
- PyTorch implementation of models from the Zamba2 series.☆63Updated last month
- ☆25Updated 4 months ago
- ☆71Updated 3 weeks ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆29Updated this week
- Gradient Boosting Reinforcement Learning☆75Updated last week
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆105Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆38Updated 3 months ago
- σ-GPT: A New Approach to Autoregressive Models☆53Updated last month
- A HuggingFace compatible xLSTM trainer.☆57Updated last week
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆103Updated last week
- ☆23Updated last month
- ☆59Updated last week
- Official implementation of MetaTree: Learning a Decision Tree Algorithm with Transformers☆97Updated last week
- ☆30Updated 4 months ago
- A State-Space Model with Rational Transfer Function Representation.☆61Updated 4 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆57Updated 4 months ago
- Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊☆106Updated this week
- ☆42Updated 3 weeks ago
- Library for Jacobian descent with PyTorch. It enables optimization of neural networks with multiple losses (e.g. multi-task learning).☆126Updated this week
- Implementation of Agent Attention in Pytorch☆83Updated 2 months ago
- ☆137Updated last month
- ☆53Updated 8 months ago
- A library for detecting problematic data segments in structured and unstructured data with few lines of code.☆61Updated 8 months ago
- Implementation of the Mamba SSM with hf_integration.☆55Updated 2 weeks ago
- This is the code that went into our practical dive using mamba as information extraction☆50Updated 8 months ago
- Kolmogorov-Arnold Transformer: A PyTorch Implementation with CUDA kernel☆221Updated this week