cloneofsimo / minSAE
☆25Updated 2 weeks ago
Alternatives and similar repositories for minSAE:
Users that are interested in minSAE are comparing it to the libraries listed below
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆71Updated 4 months ago
- supporting pytorch FSDP for optimizers☆68Updated last week
- ☆18Updated 2 months ago
- ☆74Updated 5 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆85Updated 3 weeks ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆113Updated 8 months ago
- WIP☆89Updated 4 months ago
- ☆31Updated 3 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆19Updated 2 weeks ago
- ☆21Updated 5 months ago
- ☆53Updated 10 months ago
- ☆50Updated last month
- ☆48Updated 2 months ago
- ☆31Updated last month
- Minimal but scalable implementation of large language models in JAX☆27Updated last month
- ☆138Updated 2 weeks ago
- ☆51Updated 11 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated 4 months ago
- These papers will provide unique insightful concepts that will broaden your perspective on neural networks and deep learning☆46Updated last year
- Collection of autoregressive model implementation☆67Updated 3 weeks ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated this week
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆89Updated last week
- ☆51Updated 6 months ago
- ☆26Updated 7 months ago
- Efficient optimizers☆126Updated this week
- ☆64Updated 3 months ago
- ☆60Updated last month
- Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆45Updated this week
- ☆29Updated 3 weeks ago
- ☆13Updated 5 months ago