alexzhang13 / flashattention2-custom-maskLinks

Triton implementation of FlashAttention2 that adds Custom Masks.

☆128

Alternatives and similar repositories for flashattention2-custom-mask

Users that are interested in flashattention2-custom-mask are comparing it to the libraries listed below

Sorting:

XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆186Updated 2 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆220Updated last month
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆221Updated last month
Dao-AILab / grouped-latent-attention
☆123Updated 2 months ago
FasterDecoding / TEAL
☆137Updated 5 months ago
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆212Updated 11 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆149Updated last month
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆230Updated 8 months ago
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆119Updated last year
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆72Updated last year
yuezhouhu / 2by4-pretrain
Efficient 2:4 sparse training algorithms and implementations
☆56Updated 7 months ago
epfml / dynamic-sparse-flash-attention
☆147Updated 2 years ago
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆213Updated last month
sustcsonglin / linear-attention-and-beyond-slides
☆79Updated 5 months ago
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆311Updated 3 weeks ago
mit-han-lab / Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
☆262Updated 5 months ago
HanGuo97 / log-linear-attention
☆232Updated 2 months ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆106Updated 2 months ago
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆103Updated 4 months ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆258Updated last week
z-lab / sparselora
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆48Updated last month
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆85Updated 7 months ago
ByteDance-Seed / VeOmni
VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
☆399Updated this week
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆130Updated last year
OpenSparseLLMs / Linear-MoE
☆113Updated 2 months ago
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆42Updated last month
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆320Updated this week
FasterDecoding / SnapKV
☆268Updated 3 weeks ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆69Updated 5 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆86Updated last month