ZichengXu / Decoding-Tree-SketchingLinks
☆63Updated 2 months ago
Alternatives and similar repositories for Decoding-Tree-Sketching
Users that are interested in Decoding-Tree-Sketching are comparing it to the libraries listed below
Sorting:
- KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024☆88Updated 11 months ago
- ☆22Updated last year
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆147Updated 6 months ago
- ☆74Updated 9 months ago
- Awesome list for LLM pruning.☆282Updated 3 months ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆363Updated 9 months ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆27Updated 11 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆55Updated 9 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆356Updated 2 months ago
- [ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models☆27Updated 7 months ago
- ☆49Updated last year
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆214Updated 11 months ago
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"☆80Updated 7 months ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆201Updated 2 months ago
- ☆43Updated last year
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆220Updated 8 months ago
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…☆16Updated last month
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆229Updated last year
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆49Updated 6 months ago
- [NeurIPS 2025] The implementation of paper "On Reasoning Strength Planning in Large Reasoning Models"☆30Updated 7 months ago
- ☆303Updated 7 months ago
- Explorations into some recent techniques surrounding speculative decoding☆299Updated last year
- Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)☆54Updated 10 months ago
- Awesome list for LLM quantization☆390Updated 3 months ago
- ☆51Updated last year
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆123Updated 7 months ago
- ☆30Updated 4 months ago
- Reproducing R1 for Code with Reliable Rewards☆286Updated 9 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆97Updated 11 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆261Updated 8 months ago