MIRALab-USTC / LLMReasoning-SpecSearchLinks

This is the source code of our ICML25 paper, titled "Accelerating Large Language Model Reasoning via Speculative Search".

☆20

Alternatives and similar repositories for LLMReasoning-SpecSearch

Users that are interested in LLMReasoning-SpecSearch are comparing it to the libraries listed below

Sorting:

chicwzh / AI4EDA-MUTE
This is the code for our ICLR 2025 paper, titled Computing Circuits Optimization via Model-Based Circuit Genetic Evolution.
☆11Updated 3 months ago
MoE-Inf / awesome-moe-inference
Curated collection of papers in MoE model inference
☆250Updated last month
fuvty / DeSCo
[WSDM'24 Oral] The official implementation of paper <DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting>
☆22Updated last year
PKU-SEC-Lab / AdapMoE
Code release for AdapMoE accepted by ICCAD 2024
☆32Updated 4 months ago
thu-nics / qllm-eval
Code Repository of Evaluating Quantized Large Language Models
☆130Updated 11 months ago
IST-DASLab / EvoPress
☆26Updated last month
antgroup / cakekv
☆25Updated 5 months ago
Geralt-Targaryen / Awesome-Speculative-Decoding
Reading notes on Speculative Decoding papers
☆16Updated last month
pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆43Updated 8 months ago
liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆117Updated 3 weeks ago
clevercool / ANT-Quantization
☆107Updated last year
infinigence / SpecEE
Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
☆48Updated 4 months ago
goliaro / specinfer-ae
☆23Updated last year
ZhengaoLi / DISP-LLM-Dimension-Independent-Structural-Pruning
An implementation of the DISP-LLM method from the NeurIPS 2024 paper: Dimension-Independent Structural Pruning for Large Language Models.
☆22Updated last month
abdelfattah-lab / BitMoD-HPCA-25
☆49Updated last month
PrincetonUniversity / LLMCompass
☆181Updated last year
stephenqz / OATS
Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition
☆13Updated 4 months ago
GATECH-EIC / mg-verilog
☆47Updated 10 months ago
SJTU-ReArch-Group / Paper-Reading-List
☆117Updated last month
Zefan-Cai / Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
☆356Updated 6 months ago
October2001 / Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆525Updated last month
TreeAI-Lab / Awesome-KV-Cache-Management
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…
☆192Updated last month
mit-han-lab / spatten
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
☆101Updated last year
wangqinsi1 / Dobi-SVD
Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"
☆39Updated 5 months ago
scalesim-project / scale-sim-v3
☆41Updated 3 weeks ago
tsinghua-ideal / Twilight
Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆19Updated 6 months ago
pprp / Awesome-LLM-Prune
Awesome list for LLM pruning.
☆256Updated this week
jeffreyyu0602 / quantized-training
☆30Updated last week
cornell-zhang / llm-datatypes
Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
☆27Updated last year
thu-nics / nicsefc-readme
some docs for rookies in nics-efc
☆22Updated 3 years ago