AtlasAnalyticsLab / AdaFisherLinks
[ICLR 2025] AdaFisher: Adaptive Second Order Optimization via Fisher Information
☆51Updated 11 months ago
Alternatives and similar repositories for AdaFisher
Users that are interested in AdaFisher are comparing it to the libraries listed below
Sorting:
- optimizer & lr scheduler & loss function collections in PyTorch☆388Updated this week
- ☆246Updated last year
- ☆21Updated 2 years ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆56Updated 3 months ago
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Updated last year
- Implementation of the proposed minGRU in Pytorch☆319Updated last month
- Trying out the Mamba architecture on small examples (cifar-10, shakespeare char level etc.)☆47Updated 2 years ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Updated 3 months ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆231Updated 3 months ago
- ☆79Updated last year
- ☆52Updated last month
- ☆35Updated 3 years ago
- [CVPR 2024] Friendly Sharpness-Aware Minimization☆35Updated last year
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models" 🐍☆45Updated last year
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆168Updated 2 weeks ago
- ☆13Updated last year
- ☆36Updated 10 months ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆344Updated 10 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆402Updated 4 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆92Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆82Updated 2 years ago
- ☆129Updated 6 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆352Updated 2 months ago
- Pytorch implementation of the xLSTM model by Beck et al. (2024)☆181Updated last year
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆130Updated last year
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆213Updated last week
- Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)☆82Updated last year
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆126Updated 5 months ago
- Efficient optimizers☆281Updated last month