nikhilvyas / SOAP_MUONView external linksLinks
Combining SOAP and MUON
☆19Feb 11, 2025Updated last year
Alternatives and similar repositories for SOAP_MUON
Users that are interested in SOAP_MUON are comparing it to the libraries listed below
Sorting:
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 8 months ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12May 24, 2023Updated 2 years ago
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics☆71Jan 13, 2026Updated last month
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Official repository for the paper "Exploring the Promise and Limits of Real-Time Recurrent Learning" (ICLR 2024)☆13Jun 11, 2025Updated 8 months ago
- ☆14Mar 2, 2025Updated 11 months ago
- Official repository for the paper "Automating Continual Learning"☆17Jun 11, 2025Updated 8 months ago
- ☆35Apr 8, 2025Updated 10 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆31Feb 25, 2025Updated 11 months ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Jul 17, 2023Updated 2 years ago
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14May 26, 2025Updated 8 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆19Mar 15, 2025Updated 10 months ago
- Mamba support for transformer lens☆19Sep 17, 2024Updated last year
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Jan 4, 2024Updated 2 years ago
- ☆53Dec 17, 2025Updated last month
- Repository for Sparse Universal Transformers☆20Oct 23, 2023Updated 2 years ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19May 8, 2025Updated 9 months ago
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- ☆20May 30, 2024Updated last year
- ☆19Dec 4, 2025Updated 2 months ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- PyCUDA based PyTorch Extension Made Easy☆26Mar 22, 2024Updated last year
- CUDA 12.2 HMM demos☆20Jul 26, 2024Updated last year
- implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880☆302Updated this week
- Github Repository for the HOI4 ULTRA Project.☆11Updated this week
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Dec 4, 2024Updated last year
- ☆53May 20, 2024Updated last year
- ☆24Sep 25, 2024Updated last year
- Fast Discounted Cumulative Sums in PyTorch☆97Aug 28, 2021Updated 4 years ago
- ☆67Mar 21, 2025Updated 10 months ago
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".☆34Jun 11, 2025Updated 8 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- train with kittens!☆63Oct 25, 2024Updated last year
- ☆33Oct 4, 2024Updated last year
- [NeurIPS 23' Oral] Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity☆28Apr 24, 2024Updated last year