fla-org / flameLinks

🔥 A minimal training framework for scaling FLA models

☆220

Alternatives and similar repositories for flame

Users that are interested in flame are comparing it to the libraries listed below

Sorting:

XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆186Updated 2 months ago
Dao-AILab / grouped-latent-attention
☆123Updated 2 months ago
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆85Updated 7 months ago
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆230Updated 8 months ago
alexzhang13 / flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks.
☆128Updated 11 months ago
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆221Updated last month
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆149Updated last month
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆320Updated this week
sustcsonglin / linear-attention-and-beyond-slides
☆79Updated 5 months ago
FasterDecoding / TEAL
☆137Updated 5 months ago
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆213Updated last month
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆216Updated last year
facebookresearch / PhysicsLM4
Physics of Language Models, Part 4
☆204Updated last week
OpenSparseLLMs / MoM
☆95Updated 3 months ago
OpenSparseLLMs / Linear-MoE
☆113Updated 2 months ago
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆124Updated 2 months ago
HanGuo97 / log-linear-attention
☆232Updated 2 months ago
ByteDance-Seed / VeOmni
VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
☆399Updated this week
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆130Updated last year
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆193Updated 4 months ago
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆212Updated 11 months ago
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆119Updated last year
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆258Updated last week
mit-han-lab / Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
☆262Updated 5 months ago
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆141Updated last week
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated 2 months ago
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆108Updated last month
epfml / dynamic-sparse-flash-attention
☆147Updated 2 years ago
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆311Updated 3 weeks ago
FasterDecoding / SnapKV
☆268Updated 3 weeks ago