mdy666 / Qwen-Native-Sparse-AttentionView external linksLinks
qwen-nsa
β87Oct 14, 2025Updated 4 months ago
Alternatives and similar repositories for Qwen-Native-Sparse-Attention
Users that are interested in Qwen-Native-Sparse-Attention are comparing it to the libraries listed below
Sorting:
- β48Dec 13, 2025Updated 2 months ago
- π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"β965Feb 5, 2026Updated last week
- β13Jan 7, 2025Updated last year
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inferenceβ56Nov 20, 2024Updated last year
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attentionβ281Dec 1, 2025Updated 2 months ago
- DeepSeek Native Sparse Attention pytorch implementationβ114Dec 17, 2025Updated last month
- Xmixers: A collection of SOTA efficient token/channel mixersβ28Sep 4, 2025Updated 5 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ235Jun 15, 2025Updated 8 months ago
- DeeperGEMM: crazy optimized versionβ74May 5, 2025Updated 9 months ago
- A sparse attention kernel supporting mix sparse patternsβ455Jan 18, 2026Updated 3 weeks ago
- β221Nov 19, 2025Updated 2 months ago
- β14Mar 9, 2023Updated 2 years ago
- Automated bottleneck detection and solution orchestrationβ19Updated this week
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"β29Jun 30, 2025Updated 7 months ago
- Website for CSE 234, Winter 2025β13Mar 24, 2025Updated 10 months ago
- Expert Specialization MoE Solution based on CUTLASSβ27Jan 19, 2026Updated 3 weeks ago
- LLMζζ代η ειβ19Mar 25, 2025Updated 10 months ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)β26Jan 22, 2026Updated 3 weeks ago
- [ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains Moreβ66Feb 12, 2025Updated last year
- β52May 19, 2025Updated 8 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ129Jun 24, 2025Updated 7 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ269Jul 6, 2025Updated 7 months ago
- β42Jan 24, 2026Updated 3 weeks ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUsβ61Mar 25, 2025Updated 10 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token promptβ¦β30Oct 21, 2024Updated last year
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generationβ33Oct 11, 2025Updated 4 months ago
- Implement Flash Attention using Cute.β100Dec 17, 2024Updated last year
- β106Feb 25, 2025Updated 11 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ160Oct 13, 2025Updated 4 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scalingβ22Feb 9, 2026Updated last week
- β118May 19, 2025Updated 8 months ago
- PyTorch implementation of the Flash Spectral Transform Unit.β21Sep 19, 2024Updated last year
- Wave: Python Domain-Specific Language for High Performance Machine Learningβ44Updated this week
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithmβ¦β101Aug 25, 2025Updated 5 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) servβ¦β266Updated this week
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelangβ43Nov 19, 2025Updated 2 months ago
- β22May 5, 2025Updated 9 months ago
- Vortex: A Flexible and Efficient Sparse Attention Frameworkβ46Jan 21, 2026Updated 3 weeks ago
- [NeurIPS 2024] Search for Efficient LLMsβ16Jan 16, 2025Updated last year