PiotrNawrot / nano-sparse-attention
View external linksLinks

The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.

☆92

Alternatives and similar repositories for nano-sparse-attention

Users that are interested in nano-sparse-attention are comparing it to the libraries listed below

Sorting:

PiotrNawrot / dynamic-pooling
View on GitHub
Efficient Transformers with Dynamic Token Pooling
☆67May 20, 2023Updated 2 years ago
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
PiotrNawrot / sparse-frontier
View on GitHub
The evaluation framework for training-free sparse attention in LLMs
☆119Jan 27, 2026Updated 3 weeks ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated last year
nanowell / Q-Sparse-LLM
View on GitHub
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Aug 14, 2024Updated last year
tilde-research / nsa-impl
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆129Jun 24, 2025Updated 7 months ago
thunlp / APB
View on GitHub
Official Implementation of APB (ACL 2025 main Oral) and Spava.
☆33Jan 30, 2026Updated 2 weeks ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated last year
yuzhaouoe / pretraining-data-packing
View on GitHub
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆23Aug 18, 2024Updated last year
recursal / RADLADS-paper
View on GitHub
RADLADS training code
☆37May 7, 2025Updated 9 months ago
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
View on GitHub
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Oct 17, 2023Updated 2 years ago
glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Jun 6, 2024Updated last year
iwiwi / epochraft
View on GitHub
Checkpointable dataset utilities for foundation model training
☆32Jan 29, 2024Updated 2 years ago
AlirezaMorsali / MLP-Attention
View on GitHub
☆16Dec 19, 2024Updated last year
HazyResearch / train-tk
View on GitHub
train with kittens!
☆63Oct 25, 2024Updated last year
srush / tangent
View on GitHub
Source-to-Source Debuggable Derivatives in Pure Python
☆15Jan 23, 2024Updated 2 years ago
automl / unlocking_state_tracking
View on GitHub
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆20Mar 15, 2025Updated 11 months ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
zyqCSL / DiffKV
View on GitHub
☆37Oct 11, 2025Updated 4 months ago
cvenhoff / steering-thinking-llms
View on GitHub
☆33Jul 9, 2025Updated 7 months ago
VITA-Group / Q-Hitter
View on GitHub
☆15Jun 4, 2024Updated last year
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆42Dec 29, 2025Updated last month
Doraemonzzz / Awesome-Triton-Resources
View on GitHub
Awesome Triton Resources
☆39Apr 27, 2025Updated 9 months ago
mit-han-lab / duo-attention
View on GitHub
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆524Feb 10, 2025Updated last year
machelreid / editpro
View on GitHub
Learning to Model Editing Processes
☆26Aug 3, 2025Updated 6 months ago
hpcgroup / loki
View on GitHub
Algorithms for approximate attention in LLMs
☆21Apr 14, 2025Updated 10 months ago
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆89Oct 30, 2024Updated last year
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆56Aug 20, 2024Updated last year
dame-cell / Triformer
View on GitHub
Transformers components but in Triton
☆34May 9, 2025Updated 9 months ago
renll / SeqBoat
View on GitHub
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆40Dec 2, 2023Updated 2 years ago
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
subho406 / agalite
View on GitHub
AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)
☆23Oct 15, 2024Updated last year
facebookresearch / iGSM
View on GitHub
The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…
☆84Jan 12, 2025Updated last year
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 2 months ago
mlfoundations / scaling
View on GitHub
Language models scale reliably with over-training and on downstream tasks
☆99Apr 2, 2024Updated last year
fla-org / flash-bidirectional-linear-attention
View on GitHub
Triton implement of bi-directional (non-causal) linear attention
☆65Feb 2, 2026Updated 2 weeks ago
thunlp / SparsingLaw
View on GitHub
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆30Nov 12, 2024Updated last year
formll / resolving-scaling-law-discrepancies
View on GitHub
☆20Nov 4, 2025Updated 3 months ago
GradientHQ / symphony
View on GitHub
Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…
☆30Oct 30, 2025Updated 3 months ago

PiotrNawrot / nano-sparse-attentionView external linksLinks

Alternatives and similar repositories for nano-sparse-attention

PiotrNawrot / nano-sparse-attention
View external linksLinks