AmeenAli / HiddenMambaAttn
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
☆186Updated 3 months ago
Related projects: ⓘ
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆94Updated last month
- Awesome list of papers that extend Mamba to various applications.☆124Updated 2 weeks ago
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆43Updated last week
- Reading list for research topics in state-space models☆209Updated last week
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆64Updated 6 months ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆120Updated last week
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆233Updated 4 months ago
- Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)☆141Updated 8 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆153Updated last week
- Official code for "TOAST: Transfer Learning via Attention Steering"☆186Updated last year
- Simba☆172Updated 5 months ago
- Causal depthwise conv1d in CUDA, with a PyTorch interface☆283Updated last month
- Kolmogorov-Arnold Transformer: A PyTorch Implementation with CUDA kernel☆221Updated this week
- A repository for DenseSSMs☆86Updated 5 months ago
- Open source implementation of "Vision Transformers Need Registers"☆126Updated last week
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆61Updated this week
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆48Updated last week
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆85Updated 6 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆104Updated 6 months ago
- A PyTorch implementation of the paper "ZigMa: A DiT-Style Mamba-based Diffusion Model" (ECCV 2024)☆252Updated 3 weeks ago
- Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis☆177Updated last month
- Minimal Mamba-2 implementation in PyTorch☆89Updated 3 months ago
- A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…☆101Updated 9 months ago
- Decomposing and Editing Predictions by Modeling Model Computation☆97Updated 3 months ago
- Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders☆86Updated last month
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆33Updated 3 months ago
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆215Updated 8 months ago
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆87Updated 8 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆56Updated this week
- ☆164Updated 8 months ago