dhcode-cpp / NSA-pytorchLinks

DeepSeek Native Sparse Attention pytorch implementation

☆107

Alternatives and similar repositories for NSA-pytorch

Users that are interested in NSA-pytorch are comparing it to the libraries listed below

Sorting:

mdy666 / mdy_triton
☆148Updated 4 months ago
MuLabPKU / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
☆407Updated last month
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆169Updated last month
pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆143Updated last month
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆245Updated 4 months ago
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆83Updated last month
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆128Updated 2 weeks ago
qingkelab / qingketalk
青稞Talk
☆161Updated this week
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆152Updated last month
haiduo / Jakiro
This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE"
☆31Updated last month
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆245Updated 3 months ago
stepfun-ai / Step3
☆435Updated 3 months ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆193Updated last month
OpenNLPLab / lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
☆333Updated 8 months ago
sii-research / siiRL
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
☆224Updated this week
ifromeast / AI_analysis
analyse problems of AI with Math and Code
☆27Updated 3 months ago
liangyuwang / Tiny-Megatron
Tiny-Megatron, a minimalistic re-implementation of the Megatron library
☆17Updated 2 months ago
OpenSparseLLMs / Linear-MoE
☆120Updated 5 months ago
madsys-dev / deepseekv2-profile
☆151Updated 8 months ago
hyperai / triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档
☆90Updated 7 months ago
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆106Updated 7 months ago
shreyansh26 / FlashAttention-PyTorch
Implementation of FlashAttention in PyTorch
☆173Updated 10 months ago
FFY0 / AdaKV
The Official Implementation of Ada-KV [NeurIPS 2025]
☆110Updated last month
OpenBMB / infllmv2_cuda_impl
☆72Updated 3 weeks ago
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆269Updated last week
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆676Updated 3 weeks ago
mit-han-lab / Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
☆376Updated 9 months ago
liangyuwang / Tiny-DeepSpeed
Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
☆48Updated 2 months ago
Dao-AILab / grouped-latent-attention
☆130Updated 5 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆247Updated 5 months ago