MIRALab-USTC / LLMReasoning-SpecSearchLinks
This is the source code of our ICML25 paper, titled "Accelerating Large Language Model Reasoning via Speculative Search".
☆20Updated last month
Alternatives and similar repositories for LLMReasoning-SpecSearch
Users that are interested in LLMReasoning-SpecSearch are comparing it to the libraries listed below
Sorting:
- This is the code for our ICLR 2025 paper, titled Computing Circuits Optimization via Model-Based Circuit Genetic Evolution.☆11Updated last month
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆36Updated 4 months ago
- [WSDM'24 Oral] The official implementation of paper <DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting>☆22Updated last year
- Curated collection of papers in MoE model inference☆213Updated 5 months ago
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆46Updated 3 months ago
- ☆23Updated 4 months ago
- Code Repository of Evaluating Quantized Large Language Models☆129Updated 10 months ago
- ☆23Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆14Updated 3 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆42Updated 7 months ago
- ☆46Updated 9 months ago
- ☆25Updated 2 months ago
- Reading notes on Speculative Decoding papers☆13Updated 2 weeks ago
- Awesome Artificial Intelligence for Electronic Design Automation Papers.☆176Updated last year
- ☆105Updated last year
- Code release for AdapMoE accepted by ICCAD 2024☆26Updated 2 months ago
- Awesome list for LLM pruning.☆245Updated 7 months ago
- ☆114Updated 3 weeks ago
- An implementation of the DISP-LLM method from the NeurIPS 2024 paper: Dimension-Independent Structural Pruning for Large Language Models.☆21Updated 3 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆306Updated 2 weeks ago
- ☆169Updated last year
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆164Updated 9 months ago
- ☆13Updated last year
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆490Updated last month
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆165Updated this week
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…☆68Updated 3 months ago
- ☆217Updated last year
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆146Updated last year
- some docs for rookies in nics-efc☆22Updated 3 years ago
- [DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…☆59Updated last year