qwen-nsa
☆87Oct 14, 2025Updated 5 months ago
Alternatives and similar repositories for Qwen-Native-Sparse-Attention
Users that are interested in Qwen-Native-Sparse-Attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆48Dec 13, 2025Updated 3 months ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆978Feb 5, 2026Updated last month
- Efficient triton implementation of Native Sparse Attention.☆272May 23, 2025Updated 10 months ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 2 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆115Dec 17, 2025Updated 3 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆13Jan 7, 2025Updated last year
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention☆289Dec 1, 2025Updated 3 months ago
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆30Jun 30, 2025Updated 9 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆198Sep 23, 2025Updated 6 months ago
- A sparse attention kernel supporting mix sparse patterns☆485Jan 18, 2026Updated 2 months ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 6 months ago
- [ICLR 2025, IEEE TPAMI 2026] Mixture Compressor & MC#☆69Feb 12, 2025Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆247Jun 15, 2025Updated 9 months ago
- ☆97Feb 11, 2026Updated last month
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆240Nov 19, 2025Updated 4 months ago
- DeeperGEMM: crazy optimized version☆75May 5, 2025Updated 10 months ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆58Nov 20, 2024Updated last year
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆22Mar 18, 2026Updated last week
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆132Jun 24, 2025Updated 9 months ago
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- Website for CSE 234, Winter 2025☆13Mar 24, 2025Updated last year
- ☆119May 19, 2025Updated 10 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer☆16Nov 21, 2024Updated last year
- ☆38Aug 7, 2025Updated 7 months ago
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation☆33Feb 26, 2026Updated last month
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 4 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆167Oct 13, 2025Updated 5 months ago
- Expert Specialization MoE Solution based on CUTLASS☆27Jan 19, 2026Updated 2 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆274Jul 6, 2025Updated 8 months ago
- ☆52May 19, 2025Updated 10 months ago
- ☆110Feb 25, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆291May 1, 2025Updated 10 months ago
- Automated bottleneck detection and solution orchestration☆20Feb 24, 2026Updated last month
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Deep Learning Theory course☆28Jan 3, 2022Updated 4 years ago
- ☆14Mar 9, 2023Updated 3 years ago
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2…☆14Dec 7, 2024Updated last year