tilde-research/nsa-release

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tilde-research/nsa-release)

tilde-research / nsa-release

An efficient implementation of the NSA (Native Sparse Attention) kernel

☆133

Alternatives and similar repositories for nsa-release

Users that are interested in nsa-release are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
fla-org / native-sparse-attention
View on GitHub
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
☆1,010Feb 5, 2026Updated 5 months ago
HarryWu99 / funny_cute
View on GitHub
Some funny cute/cuteDSL code snippets
☆33Mar 2, 2026Updated 4 months ago
XunhaoLai / native-sparse-attention-triton
View on GitHub
Efficient triton implementation of Native Sparse Attention.
☆284May 23, 2025Updated last year
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆47Nov 19, 2025Updated 8 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mdy666 / Scalable-Flash-Native-Sparse-Attention
View on GitHub
☆48Dec 13, 2025Updated 7 months ago
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 5 months ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
LinB203 / FSDP-Training
View on GitHub
Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA
☆32Nov 27, 2025Updated 7 months ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
mit-han-lab / flash-moba
View on GitHub
☆250Nov 19, 2025Updated 8 months ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆257Jun 15, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HanGuo97 / log-linear-attention
View on GitHub
☆284Jun 6, 2025Updated last year
HanGuo97 / hilt
View on GitHub
☆40Dec 14, 2025Updated 7 months ago
chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 7 months ago
zhixuan-lin / forgetting-transformer
View on GitHub
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆150Feb 25, 2026Updated 4 months ago
tilde-research / momoe-release
View on GitHub
Memory optimized Mixture of Experts
☆78Jul 25, 2025Updated 11 months ago
Dao-AILab / grouped-latent-attention
View on GitHub
☆135May 29, 2025Updated last year
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆731Jul 4, 2026Updated 2 weeks ago
SandAI-org / MagiAttention
View on GitHub
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆882Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
dsl-learn / cutile-learn
View on GitHub
NVIDIA cuTile learn
☆169Dec 9, 2025Updated 7 months ago
tile-ai / TileOPs
View on GitHub
High-performance LLM operator library built on TileLang.
☆161Updated this week
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
OpenBMB / infllmv2_cuda_impl
View on GitHub
☆102Feb 11, 2026Updated 5 months ago
Yifei-Zuo / Parallax
View on GitHub
Official repository for Parallax (Parameterized Local Linear Attention)
☆65Jul 7, 2026Updated last week
Dao-AILab / AI-workflow
View on GitHub
☆71Mar 24, 2026Updated 3 months ago
PiotrNawrot / sparse-frontier
View on GitHub
The evaluation framework for training-free sparse attention in LLMs
☆126Jan 27, 2026Updated 5 months ago
zhehangdu / Newton-Muon
View on GitHub
The Newton-Muon optimizer
☆30Jun 5, 2026Updated last month
shawntan / stickbreaking-attention
View on GitHub
Stick-breaking attention
☆63Jul 1, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mdy666 / Qwen-Native-Sparse-Attention
View on GitHub
qwen-nsa
☆87Oct 14, 2025Updated 9 months ago
fla-org / hybrid-distillation
View on GitHub
☆34Dec 31, 2025Updated 6 months ago
feifeibear / ChituAttention
View on GitHub
Quantized Attention on GPU
☆45Nov 22, 2024Updated last year
NVIDIA / tilus
View on GitHub
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆489Jul 5, 2026Updated 2 weeks ago
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,060Updated this week
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 7 months ago
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆27Updated this week