MIRALab-USTC / LLMReasoning-SpecSearchLinks
This is the source code of our ICML25 paper, titled "Accelerating Large Language Model Reasoning via Speculative Search".
☆22Updated 8 months ago
Alternatives and similar repositories for LLMReasoning-SpecSearch
Users that are interested in LLMReasoning-SpecSearch are comparing it to the libraries listed below
Sorting:
- This is the code for our ICLR 2025 paper, titled Computing Circuits Optimization via Model-Based Circuit Genetic Evolution.☆12Updated 8 months ago
- ☆26Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆17Updated 9 months ago
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆70Updated 9 months ago
- [WSDM'24 Oral] The official implementation of paper <DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting>☆23Updated last year
- ☆34Updated 10 months ago
- Curated collection of papers in MoE model inference☆341Updated 3 months ago
- some docs for rookies in nics-efc☆22Updated 3 years ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆53Updated last year
- ☆224Updated 3 months ago
- ☆113Updated 2 years ago
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆281Updated 2 months ago
- ☆145Updated last month
- An implementation of the DISP-LLM method from the NeurIPS 2024 paper: Dimension-Independent Structural Pruning for Large Language Models.☆23Updated 6 months ago
- Code Repository of Evaluating Quantized Large Language Models☆136Updated last year
- Reading notes on Speculative Decoding papers☆21Updated 2 months ago
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆87Updated 2 months ago
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆411Updated 11 months ago
- SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs (ICML 2025)☆32Updated 2 months ago
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆25Updated last year
- Code release for AdapMoE accepted by ICCAD 2024☆35Updated 9 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆372Updated 7 months ago
- 🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)☆12Updated this week
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection☆155Updated 11 months ago
- Awesome list for LLM pruning.☆282Updated 4 months ago
- Summary of some awesome work for optimizing LLM inference☆173Updated 2 months ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆147Updated last month
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆27Updated last year
- Large Language Model (LLM) Serving Paper and Resource List☆24Updated 8 months ago
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆174Updated last year