radarFudan / mambaLinks
☆18Updated 11 months ago
Alternatives and similar repositories for mamba
Users that are interested in mamba are comparing it to the libraries listed below
Sorting:
- A repository for DenseSSMs☆88Updated last year
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆110Updated 2 weeks ago
- A More Fair and Comprehensive Comparison between KAN and MLP☆174Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆106Updated last week
- ☆73Updated 7 months ago
- Awesome list of papers that extend Mamba to various applications.☆137Updated 3 months ago
- Unofficial Implementation of Selective Attention Transformer☆17Updated 10 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆57Updated last year
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆229Updated 4 months ago
- ☆85Updated last year
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated 2 years ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆52Updated 6 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆98Updated 11 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 4 months ago
- ☆35Updated 6 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆25Updated last week
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆73Updated last year
- Transformers + Mambas + LSTMS All in One Model☆11Updated last week
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆187Updated 2 weeks ago
- State Space Models☆70Updated last year
- ☆48Updated 7 months ago
- Geometric-Mean Policy Optimization☆80Updated last month
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆207Updated 2 weeks ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆128Updated last year
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆31Updated 5 months ago
- This repository contains the code for the paper "TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back)…☆12Updated 6 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆48Updated 5 months ago
- Unofficial Implementation of Evolutionary Model Merging☆39Updated last year
- Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise☆37Updated last year