mdy666 / Qwen-Native-Sparse-AttentionLinks

qwen-nsa

☆83

Alternatives and similar repositories for Qwen-Native-Sparse-Attention

Users that are interested in Qwen-Native-Sparse-Attention are comparing it to the libraries listed below

Sorting:

pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆143Updated last month
OpenSparseLLMs / Linear-MoE
☆120Updated 5 months ago
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆169Updated last month
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆245Updated 3 months ago
mdy666 / mdy_triton
☆148Updated 4 months ago
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆269Updated last week
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆245Updated 4 months ago
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆176Updated 2 months ago
OpenBMB / infllmv2_cuda_impl
☆72Updated 3 weeks ago
MiroMindAI / MiroRL
MiroRL is an MCP-first reinforcement learning framework for deep research agent.
☆170Updated 2 months ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆128Updated 2 weeks ago
FFY0 / AdaKV
The Official Implementation of Ada-KV [NeurIPS 2025]
☆110Updated last month
sail-sg / LongSpec
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆67Updated 4 months ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆193Updated last month
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆107Updated last week
OpenSparseLLMs / Linearization
☆61Updated 4 months ago
OpenSparseLLMs / MoM
☆106Updated 2 months ago
sii-research / siiRL
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
☆224Updated this week
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆152Updated last month
GAIR-NLP / MAYE
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆144Updated 7 months ago
rlite-project / RLite
A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…
☆75Updated 2 months ago
step-law / steplaw
☆205Updated 2 weeks ago
UNITES-Lab / MC-SMoE
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆97Updated 4 months ago
ruikangliu / IntactKV
[ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"
☆48Updated last year
liangyuwang / Tiny-DeepSpeed
Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
☆48Updated 2 months ago
dilab-zju / self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆208Updated 9 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆247Updated 5 months ago
thu-nics / MoA
[CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
☆150Updated 4 months ago
JieShibo / MoLE
[ICML 2025 Oral] Mixture of Lookup Experts
☆54Updated 6 months ago
mutonix / pyramidinfer
☆48Updated 11 months ago