Weixin-Liang / Mixture-of-Mamba
☆40Updated 2 months ago
Alternatives and similar repositories for Mixture-of-Mamba:
Users that are interested in Mixture-of-Mamba are comparing it to the libraries listed below
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆26Updated 2 weeks ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆22Updated 4 months ago
- Unofficial Implementation of Selective Attention Transformer☆16Updated 5 months ago
- The code implementation of Symbolic-MoE☆27Updated last month
- ☆78Updated 8 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆53Updated 2 weeks ago
- More dimensions = More fun☆22Updated 8 months ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆42Updated last month
- State Space Models☆69Updated 11 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆32Updated last month
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆52Updated 3 weeks ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆42Updated 5 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆103Updated 7 months ago
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆77Updated last week
- Official implementation of ECCV24 paper: POA☆24Updated 8 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆127Updated 3 months ago
- ☆31Updated 3 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆90Updated this week
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆51Updated 10 months ago
- Code for Heima☆40Updated this week
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆28Updated last year
- Awesome list of papers that extend Mamba to various applications.☆132Updated 2 weeks ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆71Updated 3 weeks ago
- ☆36Updated last month
- ☆17Updated 3 months ago
- ☆39Updated last month
- We study toy models of skill learning.☆25Updated 3 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆156Updated last month