HanGuo97 / log-linear-attentionLinks

☆232

Alternatives and similar repositories for log-linear-attention

Users that are interested in log-linear-attention are comparing it to the libraries listed below

Sorting:

tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆110Updated last month
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆193Updated 4 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆152Updated last month
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆186Updated 2 months ago
sustcsonglin / linear-attention-and-beyond-slides
☆79Updated 5 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆220Updated last month
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆213Updated last month
Dao-AILab / grouped-latent-attention
☆123Updated 2 months ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆69Updated 5 months ago
alexzhang13 / flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks.
☆128Updated 11 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆88Updated last month
hao-ai-lab / Awesome-Video-Attention
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…
☆32Updated 2 weeks ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
☆118Updated last month
mit-han-lab / patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
☆92Updated 6 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆185Updated 8 months ago
sandyresearch / chipmunk
🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …
☆78Updated last month
z-lab / sparselora
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆49Updated last month
fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
☆52Updated 6 months ago
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆85Updated 7 months ago
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆42Updated last month
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆216Updated last year
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆93Updated last month
mit-han-lab / Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
☆262Updated 5 months ago
yuezhouhu / 2by4-pretrain
Efficient 2:4 sparse training algorithms and implementations
☆56Updated 8 months ago
horseee / dKV-Cache
☆89Updated 2 months ago
test-time-training / ttt-tk
☆39Updated 4 months ago
OpenSparseLLMs / Linearization
☆54Updated last month
OpenMachine-ai / transformer-tricks
A collection of tricks and tools to speed up transformer models
☆169Updated 2 months ago
gpu-mode / ring-attention
ring-attention experiments
☆146Updated 9 months ago