srush / annotated-mambaLinks

Annotated version of the Mamba paper

☆487

Alternatives and similar repositories for annotated-mamba

Users that are interested in annotated-mamba are comparing it to the libraries listed below

Sorting:

facebookresearch / optimizers
For optimization algorithm research and development.
☆525Updated this week
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆532Updated 2 months ago
srush / Transformer-Puzzles
Puzzles for exploring transformers
☆356Updated 2 years ago
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆555Updated 7 months ago
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆288Updated 2 months ago
kvfrans / jax-diffusion-transformer
Implementation of Diffusion Transformer (DiT) in JAX
☆286Updated last year
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆221Updated 3 weeks ago
apple / ml-sigma-reparam
☆304Updated last year
KellerJordan / cifar10-airbench
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
☆274Updated 2 weeks ago
google-deepmind / nanodo
☆275Updated last year
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆565Updated this week
stanford-crfm / levanter
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆627Updated this week
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆123Updated 7 months ago
pytorch-labs / attention-gym
Helpful tools and examples for working with flex-attention
☆904Updated 2 weeks ago
PeaBrane / mamba-tiny
Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).
☆120Updated 9 months ago
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆184Updated 11 months ago
nanowell / AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
☆184Updated 10 months ago
pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆222Updated last year
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆185Updated 8 months ago
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆352Updated last year
srush / annotated-s4
Implementation of https://srush.github.io/annotated-s4
☆500Updated last month
srush / LLM-Training-Puzzles
What would you do with 1000 H100s...
☆1,079Updated last year
apple / ml-sigmoid-attention
☆293Updated 3 months ago
HomebrewML / HeavyBall
Efficient optimizers
☆252Updated last week
gpu-mode / profiling-cuda-in-torch
☆162Updated last year
mlcommons / algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…
☆389Updated this week
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated last month
nikhilvyas / SOAP
☆206Updated 8 months ago
gautierdag / bpeasy
Fast bare-bones BPE for modern tokenizer training
☆160Updated last month
marin-community / marin
☆347Updated this week