PeaBrane / mamba-tiny
Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).
☆112Updated 5 months ago
Alternatives and similar repositories for mamba-tiny:
Users that are interested in mamba-tiny are comparing it to the libraries listed below
- Reading list for research topics in state-space models☆270Updated 2 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆126Updated 2 months ago
- Annotated version of the Mamba paper☆478Updated last year
- ☆286Updated 2 months ago
- Trying out the Mamba architecture on small examples (cifar-10, shakespeare char level etc.)☆44Updated last year
- Implementation of Diffusion Transformer (DiT) in JAX☆270Updated 9 months ago
- Accelerated First Order Parallel Associative Scan☆180Updated 7 months ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆165Updated 2 months ago
- ☆172Updated 4 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆276Updated 2 weeks ago
- Evaluating the Mamba architecture on the Othello game☆46Updated 11 months ago
- Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.☆368Updated 10 months ago
- Some preliminary explorations of Mamba's context scaling.☆212Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆179Updated 6 months ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆117Updated 10 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆216Updated this week
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆80Updated last month
- Implementation of the proposed minGRU in Pytorch☆283Updated 3 weeks ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆190Updated 2 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆102Updated 6 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆223Updated last month
- Implementations of various linear RNN layers using pytorch and triton☆50Updated last year
- Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)☆163Updated last year
- Benchmarking and Testing FastKAN☆73Updated 10 months ago
- Implementation of https://srush.github.io/annotated-s4☆487Updated 2 years ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆216Updated 10 months ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆325Updated 9 months ago
- supporting pytorch FSDP for optimizers☆80Updated 3 months ago
- PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.☆78Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆103Updated 4 months ago