dhcode-cpp / NSA-pytorchLinks
DeepSeek Native Sparse Attention pytorch implementation
☆93Updated last month
Alternatives and similar repositories for NSA-pytorch
Users that are interested in NSA-pytorch are comparing it to the libraries listed below
Sorting:
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆328Updated 6 months ago
- TransMLA: Multi-Head Latent Attention Is All You Need☆353Updated 2 weeks ago
- Efficient Mixture of Experts for LLM Paper List☆124Updated this week
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆151Updated last month
- Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library☆45Updated 3 weeks ago
- qwen-nsa☆74Updated 5 months ago
- ☆143Updated 2 months ago
- ☆423Updated last month
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆453Updated last week
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆239Updated last month
- Implementation of FlashAttention in PyTorch☆166Updated 8 months ago
- A sparse attention kernel supporting mix sparse patterns☆296Updated 7 months ago
- 青稞Talk☆145Updated this week
- ☆49Updated this week
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆226Updated 2 months ago
- ☆118Updated 3 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆188Updated 2 months ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆111Updated 5 months ago
- ☆147Updated 6 months ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆81Updated 5 months ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆847Updated 5 months ago
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆146Updated 2 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆200Updated this week
- Tiny-Megatron, a minimalistic re-implementation of the Megatron library☆16Updated 2 weeks ago
- ☆42Updated last year
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆140Updated 3 months ago
- 16-fold memory access reduction with nearly no loss☆105Updated 5 months ago
- ☆126Updated 3 months ago
- ☆198Updated 5 months ago
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆91Updated 3 months ago