kyegomez / MambaFormerLinks
Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks"
☆21Updated last week
Alternatives and similar repositories for MambaFormer
Users that are interested in MambaFormer are comparing it to the libraries listed below
Sorting:
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated 4 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 9 months ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆36Updated 2 months ago
- We study toy models of skill learning.☆28Updated 4 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆55Updated 2 months ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆42Updated 3 months ago
- Explorations into improving ViTArc with Slot Attention☆41Updated 7 months ago
- Implementation of Spectral State Space Models☆16Updated last year
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆36Updated 8 months ago
- Implementation of a transformer for reinforcement learning using `x-transformers`☆48Updated this week
- Induce brain-like topographic structure in your neural networks☆62Updated 2 weeks ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆35Updated 6 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Updated 4 years ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated last week
- Implementation and explorations into Blackbox Gradient Sensing (BGS), an evolutionary strategies approach proposed in a Google Deepmind p…☆13Updated this week
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆16Updated last month
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 8 months ago
- ☆23Updated 8 months ago
- Utilities for PyTorch distributed☆24Updated 3 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆137Updated 4 months ago
- ☆26Updated 10 months ago
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14Updated last week
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)