ironjr / grokfast
Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"
☆539Updated 7 months ago
Alternatives and similar repositories for grokfast:
Users that are interested in grokfast are comparing it to the libraries listed below
- Annotated version of the Mamba paper☆470Updated 11 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆178Updated 4 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆833Updated last week
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆271Updated 2 months ago
- Open weights language model from Google DeepMind, based on Griffin.☆614Updated 6 months ago
- Code repository for Black Mamba☆234Updated 11 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆219Updated 3 weeks ago
- For optimization algorithm research and development.☆486Updated last week
- ☆495Updated 6 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆105Updated 3 months ago
- Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch☆891Updated this week
- ☆149Updated last month
- ☆296Updated 7 months ago
- Normalized Transformer (nGPT)☆146Updated 2 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆280Updated 3 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆498Updated 3 months ago
- 94% on CIFAR-10 in 2.6 seconds 💨 96% in 27 seconds☆196Updated 2 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆261Updated 7 months ago
- DeMo: Decoupled Momentum Optimization☆171Updated last month
- supporting pytorch FSDP for optimizers☆75Updated last month
- A repository for log-time feedforward networks☆217Updated 9 months ago
- Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"☆412Updated last month
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆285Updated 2 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆510Updated this week
- Efficient optimizers☆151Updated this week
- Training Large Language Model to Reason in a Continuous Latent Space☆735Updated this week
- ☆203Updated 6 months ago
- Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.…☆75Updated 3 weeks ago
- [ICLR2025] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆482Updated this week
- Helpful tools and examples for working with flex-attention☆603Updated this week