sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

☆1,339

Related projects ⓘ

Alternatives and complementary repositories for flash-linear-attention

pytorch-labs / attention-gym
Helpful tools and examples for working with flex-attention
☆469Updated 3 weeks ago
srush / annotated-mamba
Annotated version of the Mamba paper
☆457Updated 8 months ago
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆476Updated 3 weeks ago
srush / Triton-Puzzles
Puzzles for learning Triton
☆1,135Updated this week
mirage-project / mirage
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
☆636Updated this week
alxndrTL / mamba.py
A simple and efficient Mamba implementation in pure PyTorch and MLX.
☆1,012Updated 2 months ago
test-time-training / ttt-lm-pytorch
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆1,040Updated 4 months ago
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆483Updated 3 weeks ago
NVlabs / DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
☆633Updated last month
zhuzilin / ring-flash-attention
Ring attention implementation with flash attention
☆585Updated last week
HazyResearch / aisys-building-blocks
Building blocks for foundation models.
☆394Updated 10 months ago
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆537Updated 6 months ago
zyushun / Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆328Updated 3 weeks ago
HazyResearch / ThunderKittens
Tile primitives for speedy kernels
☆1,658Updated this week
facebookresearch / schedule_free
Schedule-Free Optimization in PyTorch
☆1,898Updated 2 weeks ago
microsoft / Tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
☆735Updated this week
Haiyang-W / TokenFormer
Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆335Updated last week
haoliuhl / ringattention
Transformers with Arbitrarily Large Context
☆641Updated 3 months ago
tspeterkim / flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
☆626Updated 7 months ago
test-time-training / ttt-lm-jax
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆366Updated 3 months ago
lucidrains / rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
☆571Updated last week
srush / awesome-o1
A bibliography and survey of the papers surrounding o1
☆754Updated this week
XueFuzhao / awesome-mixture-of-experts
A collection of AWESOME things about mixture-of-experts
☆972Updated 3 months ago
AvivBick / awesome-ssm-ml
Reading list for research topics in state-space models
☆241Updated 2 weeks ago
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆293Updated 5 months ago
thuml / depyf
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
☆500Updated 2 weeks ago
microsoft / Samba
Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"
☆803Updated 3 months ago
jiaweizzhao / GaLore
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
☆1,435Updated 3 weeks ago
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,149Updated last month