This is the source code of our ICML25 paper, titled "Accelerating Large Language Model Reasoning via Speculative Search".
☆23Jun 1, 2025Updated 9 months ago
Alternatives and similar repositories for LLMReasoning-SpecSearch
Users that are interested in LLMReasoning-SpecSearch are comparing it to the libraries listed below
Sorting:
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆28Jul 15, 2025Updated 7 months ago
- ChiPBench:Benchmarking End-to-End Performance of AI-based Chip Placement Algorithms☆52Sep 22, 2025Updated 5 months ago
- ☆17Nov 18, 2025Updated 3 months ago
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆18Apr 16, 2025Updated 10 months ago
- ☆17May 2, 2024Updated last year
- channel pruning for accelerating very deep neural networks☆13Mar 8, 2021Updated 4 years ago
- LLM Quantization toolkit☆19Jul 4, 2025Updated 8 months ago
- Reading notes on Speculative Decoding papers☆20Feb 24, 2026Updated last week
- TPAMI 2025 Survey Paper☆25Mar 31, 2025Updated 11 months ago
- Pytorch implementation of our paper accepted by NeurIPS 2022 -- Learning Best Combination for Efficient N:M Sparsity☆22Jan 13, 2023Updated 3 years ago
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated 2 years ago
- This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…☆17Oct 25, 2024Updated last year
- Evolutionary-Algorithm and Large-Language-Model☆22Nov 5, 2024Updated last year
- [ICML2025] KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference☆26Jan 27, 2026Updated last month
- Here is the Feiyue handbook for all ECE students, including 1) how to prepare for your application, 2) official program organized by HUST…☆16Jun 6, 2020Updated 5 years ago
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆23Mar 16, 2025Updated 11 months ago
- PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation [NeurIPS 2025]☆18Oct 11, 2025Updated 4 months ago
- Code for KDD 2023 long paper: MetricPrompt: Prompting Model as a Relevance Metric for Few-Shot Text Classification☆19Aug 10, 2024Updated last year
- ☆18Sep 21, 2022Updated 3 years ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆27Feb 11, 2025Updated last year
- ☆25Feb 22, 2024Updated 2 years ago
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆55Feb 1, 2026Updated last month
- AbstainQA, ACL 2024☆29Feb 4, 2026Updated last month
- ☆36Jun 13, 2025Updated 8 months ago
- Implementation of Effective Sparsification of Neural Networks with Global Sparsity Constraint☆31Mar 24, 2022Updated 3 years ago
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆39Nov 1, 2024Updated last year
- [ICLR 2025] Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆50Oct 19, 2025Updated 4 months ago
- ☆36Jun 20, 2022Updated 3 years ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆53Aug 6, 2025Updated 6 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆44Feb 18, 2026Updated 2 weeks ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆47Jun 4, 2024Updated last year
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs☆82Jan 17, 2026Updated last month
- Pytorch implementation of TPAMI 2022 -- 1xN Pattern for Pruning Convolutional Neural Networks☆42Sep 14, 2022Updated 3 years ago
- ☆47Jun 27, 2024Updated last year
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆93Dec 2, 2025Updated 3 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆71Sep 18, 2025Updated 5 months ago
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆43Jul 26, 2024Updated last year
- Nachos XMU操作系统课程实验☆54Oct 5, 2019Updated 6 years ago
- Awesome list for LLM pruning.☆288Oct 11, 2025Updated 4 months ago