Weixin-Liang / Mixture-of-Mamba
☆40Updated 3 months ago
Alternatives and similar repositories for Mixture-of-Mamba
Users that are interested in Mixture-of-Mamba are comparing it to the libraries listed below
Sorting:
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆22Updated 2 weeks ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆42Updated 2 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆52Updated 10 months ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆20Updated 9 months ago
- ☆78Updated 8 months ago
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆30Updated last week
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆27Updated last month
- Unofficial Implementation of Selective Attention Transformer☆16Updated 6 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 6 months ago
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆42Updated 6 months ago
- ☆58Updated 3 months ago
- More dimensions = More fun☆22Updated 9 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated 3 weeks ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated last year
- Code for Heima☆42Updated 3 weeks ago
- Official repo of paper LM2☆39Updated 3 months ago
- ☆48Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆91Updated this week
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆161Updated last month
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 7 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 9 months ago
- Minimal Implementation of Visual Autoregressive Modelling (VAR)☆33Updated last month
- ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2☆64Updated 5 months ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆36Updated last month
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆222Updated 11 months ago
- A repository for DenseSSMs☆87Updated last year
- State Space Models☆67Updated last year
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆53Updated last month
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆54Updated 5 months ago