srush / annotated-mamba
Annotated version of the Mamba paper
β475Updated last year
Alternatives and similar repositories for annotated-mamba:
Users that are interested in annotated-mamba are comparing it to the libraries listed below
- Helpful tools and examples for working with flex-attentionβ689Updated last week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β524Updated last month
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ506Updated 4 months ago
- For optimization algorithm research and development.β498Updated this week
- Implementation of https://srush.github.io/annotated-s4β485Updated 2 years ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorchβ318Updated 9 months ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorchβ650Updated 3 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).β114Updated 5 months ago
- Some preliminary explorations of Mamba's context scaling.β213Updated last year
- Understand and test language model architectures on synthetic tasks.β184Updated 2 weeks ago
- Reading list for research topics in state-space modelsβ267Updated 2 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAIβ276Updated this week
- Puzzles for exploring transformersβ333Updated last year
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"β548Updated 2 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β223Updated last month
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jaxβ557Updated this week
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Coresβ306Updated 2 months ago
- Large Context Attentionβ690Updated last month
- Implementation of Diffusion Transformer (DiT) in JAXβ269Updated 9 months ago
- Universal Tensor Operations in Einstein-Inspired Notation for Python.β361Updated last month
- β165Updated last year
- A repository for research on medium sized language models.β493Updated 2 months ago
- When it comes to optimizers, it's always better to be safe than sorryβ214Updated 3 weeks ago
- β287Updated 3 months ago
- β301Updated 9 months ago
- What would you do with 1000 H100s...β1,016Updated last year
- β261Updated last month
- β182Updated this week
- Normalized Transformer (nGPT)β162Updated 4 months ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β232Updated 2 weeks ago