An efficient implementation of the NSA (Native Sparse Attention) kernel
☆129Jun 24, 2025Updated 8 months ago
Alternatives and similar repositories for nsa-impl
Users that are interested in nsa-impl are comparing it to the libraries listed below
Sorting:
- ☆15Mar 2, 2025Updated last year
- ☆19Dec 4, 2025Updated 2 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆969Feb 5, 2026Updated 3 weeks ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning☆29Sep 12, 2025Updated 5 months ago
- Efficient triton implementation of Native Sparse Attention.☆268May 23, 2025Updated 9 months ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Jul 17, 2025Updated 7 months ago
- ☆20May 30, 2024Updated last year
- ☆134May 29, 2025Updated 9 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆140Updated this week
- a simple API to use CUPTI