mdy666 / qwen-nsa

qwen-nsa

☆42

Alternatives and similar repositories for qwen-nsa:

Users that are interested in qwen-nsa are comparing it to the libraries listed below

OpenSparseLLMs / Linear-MoE
☆70Updated 2 weeks ago
zhzihao / QPruningKV
More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression
☆11Updated 2 months ago
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆88Updated last week
OpenSparseLLMs / MoM
☆71Updated last week
mdy666 / mdy_triton
☆102Updated last week
bytedance / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆62Updated last week
hemingkx / TokenSkip
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
☆98Updated 2 weeks ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆116Updated this week
step-law / steplaw
☆139Updated 2 weeks ago
FFY0 / AdaKV
The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
☆67Updated 2 months ago
mit-han-lab / x-attention
XAttention: Block Sparse Attention with Antidiagonal Scoring
☆102Updated this week
SUSTechBruce / LOOK-M
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆92Updated 4 months ago
sail-sg / LongSpec
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
☆42Updated 3 weeks ago
OpenSparseLLMs / Linearization
☆36Updated last week
mit-han-lab / Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
☆168Updated last month
mutonix / pyramidinfer
☆39Updated 4 months ago
abdelfattah-lab / TokenButler
☆18Updated last week
AkideLiu / MiniCache
☆9Updated 6 months ago
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆168Updated last month
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆170Updated 3 weeks ago
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆52Updated last month
ruikangliu / IntactKV
Official PyTorch implementation of IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
☆43Updated 10 months ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆62Updated 2 weeks ago
SqueezeAILab / SqueezedAttention
SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference
☆45Updated 4 months ago
pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆47Updated 3 months ago
NonvolatileMemory / flash_tree_attn
☆19Updated 3 months ago
Zanette-Labs / SpeculativeRejection
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆40Updated 5 months ago
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆46Updated 3 weeks ago
OpenSparseLLMs / LLaMA-MoE-v2
🚀LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
☆75Updated 3 months ago
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆81Updated 4 months ago