srush / annotated-mambaLinks
Annotated version of the Mamba paper
☆491Updated last year
Alternatives and similar repositories for annotated-mamba
Users that are interested in annotated-mamba are comparing it to the libraries listed below
Sorting:
- For optimization algorithm research and development.☆547Updated 2 weeks ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆561Updated 11 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆548Updated 6 months ago
- Understand and test language model architectures on synthetic tasks.☆240Updated 2 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆294Updated 6 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆584Updated 3 months ago
- Implementation of https://srush.github.io/annotated-s4☆506Updated 5 months ago
- Puzzles for exploring transformers☆380Updated 2 years ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆373Updated last year
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆685Updated last week
- ☆314Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆327Updated 2 weeks ago
- Implementation of Diffusion Transformer (DiT) in JAX☆297Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆135Updated 11 months ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆334Updated 11 months ago
- Helpful tools and examples for working with flex-attention☆1,062Updated 2 weeks ago
- ☆285Updated last year
- ☆303Updated 7 months ago
- Accelerated First Order Parallel Associative Scan☆192Updated last year
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆127Updated last year
- A repository for log-time feedforward networks☆223Updated last year
- ☆177Updated last year
- ☆166Updated 2 years ago
- Efficient optimizers☆276Updated 3 weeks ago
- ☆293Updated 11 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆258Updated 2 years ago
- Normalized Transformer (nGPT)☆194Updated last year
- What would you do with 1000 H100s...☆1,132Updated last year
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆401Updated this week
- Implementation of the Llama architecture with RLHF + Q-learning☆168Updated 10 months ago