goombalab / phi-mambaLinks
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models)
☆116Updated last year
Alternatives and similar repositories for phi-mamba
Users that are interested in phi-mamba are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆231Updated last week
- Some preliminary explorations of Mamba's context scaling.☆216Updated last year
- ☆86Updated last year
- Stick-breaking attention☆61Updated 3 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆179Updated 4 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆108Updated last week
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆131Updated last month
- Understand and test language model architectures on synthetic tasks.☆233Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆129Updated last year
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆31Updated 6 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆161Updated 8 months ago
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models☆42Updated 3 months ago
- A MAD laboratory to improve AI architecture designs 🧪