kyegomez / MambaFormerLinks
Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks"
☆21Updated this week
Alternatives and similar repositories for MambaFormer
Users that are interested in MambaFormer are comparing it to the libraries listed below
Sorting:
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆15Updated 5 months ago
- Explorations into improving ViTArc with Slot Attention☆42Updated 8 months ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆42Updated 3 months ago
- Implementation of Spectral State Space Models☆16Updated last year
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆35Updated 7 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated 5 months ago
- We study toy models of skill learning.☆28Updated 5 months ago
- Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise☆36Updated 9 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆55Updated 10 months ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆38Updated 2 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆89Updated last year
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆56Updated last year
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆36Updated 9 months ago
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14Updated last month
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆55Updated 2 months ago
- More dimensions = More fun☆22Updated 10 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆26Updated 4 months ago
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆59Updated 7 months ago
- Deep Networks Grok All the Time and Here is Why☆37Updated last year
- Pytorch (Lightning) implementation of the Mamba model☆29Updated 2 months ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆58Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆83Updated last year
- Implementation and explorations into Blackbox Gradient Sensing (BGS), an evolutionary strategies approach proposed in a Google Deepmind p…☆16Updated 3 weeks ago
- Implementation of a transformer for reinforcement learning using `x-transformers`☆59Updated last week
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆16Updated 2 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated last month
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆55Updated last year
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆138Updated 4 months ago
- Implementation of Infini-Transformer in Pytorch☆111Updated 5 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆37Updated 4 months ago