ZichengXu / Decoding-Tree-SketchingLinks
☆63Updated last week
Alternatives and similar repositories for Decoding-Tree-Sketching
Users that are interested in Decoding-Tree-Sketching are comparing it to the libraries listed below
Sorting:
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆86Updated 9 months ago
- ☆22Updated last year
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆134Updated 3 months ago
- Awesome list for LLM pruning.☆276Updated last month
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆338Updated 7 months ago
- ☆49Updated last year
- Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)☆51Updated 8 months ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆24Updated 9 months ago
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆45Updated 7 months ago
- ☆68Updated 7 months ago
- ☆290Updated 4 months ago
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…☆16Updated 2 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆335Updated last week
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆52Updated 7 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆209Updated 6 months ago
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"☆74Updated 4 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆210Updated 9 months ago
- ☆43Updated last year
- ☆37Updated last week
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆193Updated this week
- ☆22Updated 2 years ago
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆118Updated 4 months ago
- Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.☆89Updated last year
- [NeurIPS 2025] The implementation of paper "On Reasoning Strength Planning in Large Reasoning Models"☆26Updated 4 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆70Updated 8 months ago
- Explorations into some recent techniques surrounding speculative decoding☆295Updated 11 months ago
- [ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models☆25Updated 4 months ago
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆48Updated 4 months ago
- A Survey on Data Selection for Language Models☆252Updated 7 months ago
- ☆48Updated last year