mdy666 / mdy_triton
☆112Updated this week
Alternatives and similar repositories for mdy_triton:
Users that are interested in mdy_triton are comparing it to the libraries listed below
- ☆130Updated last month
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆75Updated last month
- qwen-nsa☆49Updated this week
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference☆69Updated 2 months ago
- A sparse attention kernel supporting mix sparse patterns☆186Updated 2 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆179Updated 2 months ago
- The blog, read report and code example for AGI/LLM related knowledge.☆36Updated 2 months ago
- Multi-Candidate Speculative Decoding☆35Updated 11 months ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆108Updated 10 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆269Updated 4 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs