radarFudan / mambaLinks

☆18

Alternatives and similar repositories for mamba

Users that are interested in mamba are comparing it to the libraries listed below

Sorting:

fangyuan-ksgk / selective-attention-transformer
Unofficial Implementation of Selective Attention Transformer
☆17Updated 11 months ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆54Updated last year
Weixin-Liang / Mixture-of-Mamba
☆50Updated 8 months ago
WailordHe / DenseSSM
A repository for DenseSSMs
☆88Updated last year
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆108Updated this week
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 9 months ago
TianjinYellow / SPAM-Optimizer
☆34Updated 7 months ago
krafton-ai / mambaformer-icl
MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248
☆57Updated last year
MarkXCloud / CSpD
The official repo of continuous speculative decoding
☆30Updated 6 months ago
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆110Updated last week
Adamdad / rational_kat_cu
☆75Updated 8 months ago
assafbk / DeciMamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)
☆31Updated 6 months ago
badripatro / mamba360
State Space Models
☆70Updated last year
pengzhangzhi / Awesome-Mamba
Awesome list of papers that extend Mamba to various applications.
☆138Updated 4 months ago
KindXiaoming / physics_of_skill_learning
We study toy models of skill learning.
☆31Updated 9 months ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated last year
yu-rp / KANbeFair
A More Fair and Comprehensive Comparison between KAN and MLP
☆175Updated last year
hyperevolnet / Terminator
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆39Updated 6 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆99Updated last year
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆131Updated 3 weeks ago
RobertCsordas / moeut
☆85Updated last year
kyegomez / MambaTransformer
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
☆206Updated 2 weeks ago
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 5 months ago
JinjieNi / dlms-are-super-data-learners
The official github repo for "Diffusion Language Models are Super Data Learners".
☆134Updated 2 weeks ago
test-time-training / ttt-lm-kernels
Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆73Updated last year
nreHieW / minARImageGen
Autoregressive Image Generation
☆32Updated 4 months ago
kyegomez / VortexFusion
Transformers + Mambas + LSTMS All in One Model
☆12Updated 2 weeks ago
kyegomez / MultiQueryAttention
This is a simple torch implementation of the high performance Multi-Query Attention
☆15Updated 2 years ago
lucidrains / simplicial-attention
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆47Updated last month
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆59Updated last year