MIRALab-USTC / LLMReasoning-SpecSearchLinks
This is the source code of our ICML25 paper, titled "Accelerating Large Language Model Reasoning via Speculative Search".
☆22Updated 8 months ago
Alternatives and similar repositories for LLMReasoning-SpecSearch
Users that are interested in LLMReasoning-SpecSearch are comparing it to the libraries listed below
Sorting:
- This is the code for our ICLR 2025 paper, titled Computing Circuits Optimization via Model-Based Circuit Genetic Evolution.☆12Updated 8 months ago
- [WSDM'24 Oral] The official implementation of paper <DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting>☆23Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆17Updated 9 months ago
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆70Updated 9 months ago
- ☆34Updated 10 months ago
- Curated collection of papers in MoE model inference☆341Updated 3 months ago
- ☆113Updated 2 years ago
- ☆224Updated 3 months ago
- Code Repository of Evaluating Quantized Large Language Models☆136Updated last year
- ☆145Updated last month
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆281Updated 2 months ago
- ☆26Updated last year
- Awesome list for LLM pruning.☆282Updated 4 months ago
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆50Updated 3 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆372Updated 7 months ago
- Reading notes on Speculative Decoding papers☆21Updated 2 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆53Updated last year
- An implementation of the DISP-LLM method from the NeurIPS 2024 paper: Dimension-Independent Structural Pruning for Large Language Models.☆23Updated 6 months ago
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆147Updated 6 months ago
- SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs (ICML 2025)☆32Updated 2 months ago
- ☆54Updated last year
- Using LLM to evaluate MMLU dataset.☆42Updated last year
- some docs for rookies in nics-efc☆22Updated 3 years ago
- Code release for AdapMoE accepted by ICCAD 2024☆35Updated 9 months ago
- [DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…☆81Updated last year
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning☆122Updated last year
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆658Updated 4 months ago
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆411Updated 11 months ago
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…☆88Updated 10 months ago
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆25Updated last year