fla-org / flash-linear-attentionLinks

🚀 Efficient implementations of state-of-the-art linear attention models

☆3,517

Alternatives and similar repositories for flash-linear-attention

Users that are interested in flash-linear-attention are comparing it to the libraries listed below

Sorting:

KellerJordan / Muon
Muon is an optimizer for hidden layers in neural networks
☆1,888Updated 3 months ago
fla-org / native-sparse-attention
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
☆903Updated 7 months ago
huggingface / picotron
Minimalistic 4D-parallelism distributed training framework for education purpose
☆1,856Updated last month
srush / Triton-Puzzles
Puzzles for learning Triton
☆2,036Updated 11 months ago
pytorch / torchtitan
A PyTorch native platform for training generative AI models
☆4,561Updated this week
lucidrains / native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
☆772Updated 2 months ago
meta-pytorch / attention-gym
Helpful tools and examples for working with flex-attention
☆1,020Updated last week
HazyResearch / ThunderKittens
Tile primitives for speedy kernels
☆2,821Updated last week
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,834Updated this week
pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆2,438Updated this week
mirage-project / mirage
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆1,891Updated this week
test-time-training / ttt-lm-pytorch
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆1,263Updated last year
flashinfer-ai / flashinfer
FlashInfer: Kernel Library for LLM Serving
☆3,952Updated this week
MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆1,336Updated 2 months ago
ML-GSAI / LLaDA
Official PyTorch implementation for "Large Language Diffusion Models"
☆3,079Updated last week
KellerJordan / modded-nanogpt
NanoGPT (124M) in 3 minutes
☆3,565Updated last week
policy-gradient / GRPO-Zero
Implementing DeepSeek R1's GRPO algorithm from scratch
☆1,621Updated 6 months ago
huggingface / nanotron
Minimalistic large language model 3D-parallelism training
☆2,267Updated last month
tile-ai / tilelang
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆3,658Updated this week
lucidrains / titans-pytorch
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
☆1,474Updated last week
THUDM / slime
slime is an LLM post-training framework for RL Scaling.
☆2,170Updated last week
HuangOwen / Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
☆1,690Updated 3 months ago
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,318Updated 3 months ago
XueFuzhao / awesome-mixture-of-experts
A collection of AWESOME things about mixture-of-experts
☆1,216Updated 10 months ago
facebookresearch / blt
Code for BLT research paper
☆1,995Updated 5 months ago
MoonshotAI / MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
☆1,941Updated 6 months ago
jiaweizzhao / GaLore
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
☆1,610Updated 11 months ago
zhuzilin / ring-flash-attention
Ring attention implementation with flash attention
☆901Updated last month
facebookresearch / coconut
Training Large Language Model to Reason in a Continuous Latent Space
☆1,297Updated 2 months ago
horseee / Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
☆1,874Updated 4 months ago