ironjr / grokfast
Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"
☆516Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for grokfast
- Annotated version of the Mamba paper☆457Updated 8 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆172Updated 2 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆259Updated 2 weeks ago
- A repository for log-time feedforward networks☆216Updated 7 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆102Updated last month
- Implementation of Diffusion Transformer (DiT) in JAX☆252Updated 5 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆113Updated this week
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆804Updated 3 months ago
- Open weights language model from Google DeepMind, based on Griffin.☆607Updated 4 months ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆108Updated 5 months ago
- Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆355Updated last week
- Schedule-Free Optimization in PyTorch☆1,900Updated 2 weeks ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆477Updated 3 weeks ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆328Updated 3 weeks ago
- Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.☆349Updated 6 months ago
- Cost aware hyperparameter tuning algorithm☆124Updated 4 months ago
- Code repository for Black Mamba☆232Updated 9 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆483Updated 3 weeks ago
- Liquid Structural State-Space Models☆317Updated 9 months ago
- ☆197Updated 4 months ago
- For optimization algorithm research and development.☆451Updated this week
- Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"☆338Updated this week
- A Jax-based library for designing and training transformer models from scratch.☆276Updated 2 months ago
- Some preliminary explorations of Mamba's context scaling.☆191Updated 9 months ago
- ☆292Updated 5 months ago
- 94% on CIFAR-10 in 2.6 seconds 💨 96% in 27 seconds☆178Updated last week
- Universal Tensor Operations in Einstein-Inspired Notation for Python.☆328Updated last month
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆537Updated 6 months ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆334Updated 3 months ago
- Normalized Transformer (nGPT)☆87Updated this week