fla-org / flash-bidirectional-linear-attentionLinks

Triton implement of bi-directional (non-causal) linear attention

☆56

Alternatives and similar repositories for flash-bidirectional-linear-attention

Users that are interested in flash-bidirectional-linear-attention are comparing it to the libraries listed below

Sorting:

Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated last month
fla-org / fla-zoo
Flash-Linear-Attention models beyond language
☆19Updated last month
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆131Updated last month
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆36Updated last year
svg-project / flash-kmeans
Fast and memory-efficient exact kmeans
☆111Updated 3 weeks ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
mit-han-lab / VisCompare
A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders
☆23Updated 8 months ago
horseee / dKV-Cache
[NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models
☆110Updated 5 months ago
TsinghuaC3I / Fourier-Position-Embedding
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆99Updated 4 months ago
ylsung / rsq
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆19Updated 4 months ago
BBuf / flash-rwkv
☆32Updated last year
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
OpenSparseLLMs / Linearization
☆61Updated 3 months ago
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆94Updated 10 months ago
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆119Updated 4 months ago
z-lab / sparselora
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆58Updated 3 months ago
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆239Updated 3 months ago
berlino / gated_linear_attention
☆105Updated last year
mdy666 / Scalable-Flash-Native-Sparse-Attention
☆42Updated this week
ThisisBillhe / EfficientDM
[ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…
☆66Updated last year
HanGuo97 / log-linear-attention
☆251Updated 4 months ago
TianjinYellow / SPAM-Optimizer
☆34Updated 7 months ago
deep-spin / adasplash
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆26Updated 3 weeks ago
mit-han-lab / patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
☆92Updated 9 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆195Updated 4 months ago
nasosger / MuToR
[NeurIPS '25] Multi-Token Prediction Needs Registers
☆22Updated last month
howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆50Updated 3 months ago
qiuzh20 / gated_attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆95Updated last month
ruikangliu / Quantized-Reasoning-Models
[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"
☆55Updated 3 months ago
thu-ml / TetraJet-MXFP4Training
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆30Updated 4 months ago