PeaBrane / mamba-tinyLinks

Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).

☆125

Alternatives and similar repositories for mamba-tiny

Users that are interested in mamba-tiny are comparing it to the libraries listed below

Sorting:

srush / annotated-mamba
Annotated version of the Mamba paper
☆490Updated last year
AvivBick / awesome-ssm-ml
Reading list for research topics in state-space models
☆329Updated 4 months ago
KellerJordan / cifar10-airbench
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
☆320Updated 3 months ago
goombalab / hydra
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆161Updated 9 months ago
nikhilvyas / SOAP
☆220Updated 10 months ago
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆291Updated 4 months ago
apapiu / mamba_small_bench
Trying out the Mamba architecture on small examples (cifar-10, shakespeare char level etc.)
☆47Updated last year
apple / ml-sigmoid-attention
☆302Updated 6 months ago
lindermanlab / S5
☆306Updated 9 months ago
lucidrains / minGRU-pytorch
Implementation of the proposed minGRU in Pytorch
☆306Updated 7 months ago
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆366Updated last year
kyegomez / Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
☆192Updated this week
kvfrans / jax-diffusion-transformer
Implementation of Diffusion Transformer (DiT) in JAX
☆294Updated last year
CG80499 / KAN-GPT-2
Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆121Updated last year
alxndrTL / othello_mamba
Evaluating the Mamba architecture on the Othello game
☆48Updated last year
nanowell / AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
☆186Updated last year
srush / annotated-s4
Implementation of https://srush.github.io/annotated-s4
☆504Updated 4 months ago
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆189Updated last year
hkproj / mamba-notes
Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)
☆173Updated last year
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
jacobfa / fft
☆128Updated 2 months ago
NX-AI / xlstm-jax
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.…
☆104Updated 9 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆168Updated 4 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆337Updated last month
pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆222Updated last year
AmeenAli / HiddenMambaAttn
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
☆228Updated 2 weeks ago
bobby-he / simplified_transformers
☆292Updated 10 months ago
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆216Updated last year
goombalab / phi-mamba
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…
☆116Updated last year
lucidrains / block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
☆221Updated last year