fla-org / fla-zooLinks

Flash-Linear-Attention models beyond language

☆20

Alternatives and similar repositories for fla-zoo

Users that are interested in fla-zoo are comparing it to the libraries listed below

Sorting:

Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated 3 months ago
00ffcc / chunkRWKV6
continous batching and parallel acceleration for RWKV6
☆22Updated last year
IBM / selective-dense-state-space-model
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …
☆15Updated 2 months ago
OpenSparseLLMs / Linearization
☆61Updated 5 months ago
fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
☆56Updated 10 months ago
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆126Updated 5 months ago
jxiw / M1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
☆45Updated 4 months ago
sustcsonglin / linear-attention-and-beyond-slides
☆99Updated 9 months ago
deep-spin / adasplash
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆30Updated 2 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 5 months ago
AwesomeSeq / Comba-triton
☆49Updated 5 months ago
Infini-AI-Lab / Kinetics
Kinetics: Rethinking Test-Time Scaling Laws
☆84Updated 4 months ago
OpenSparseLLMs / MoM
☆110Updated 2 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆134Updated last month
howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆51Updated 4 months ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆74Updated 9 months ago
BBuf / flash-rwkv
☆32Updated last year
AkideLiu / MiniCache
☆10Updated last year
NonvolatileMemory / flash_tree_attn
☆19Updated 11 months ago
chuanyang-Zheng / DAPE
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
☆39Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆38Updated 7 months ago
abdelfattah-lab / TokenButler
☆26Updated last week
Yifei-Zuo / Flash-LLA
Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…
☆23Updated 2 months ago
Dao-AILab / grouped-latent-attention
☆132Updated 6 months ago
Anonymous1252022 / Megatron-DeepSpeed
☆14Updated last year
HanGuo97 / log-linear-attention
☆256Updated 6 months ago
assafbk / DeciMamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)
☆31Updated 7 months ago
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆96Updated last year
OpenSparseLLMs / Linear-MoE
☆120Updated 6 months ago