falcon-xu / early-exit-papersLinks

A curated list of early exiting (LLM, CV, NLP, etc)

☆68

Alternatives and similar repositories for early-exit-papers

Users that are interested in early-exit-papers are comparing it to the libraries listed below

Sorting:

yxli2123 / LoSparse
☆61Updated 2 years ago
liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆132Updated 3 months ago
CASIA-IVA-Lab / FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆63Updated last year
UbiquitousLearning / Efficient_Foundation_Model_Survey
Survey Paper List - Efficient LLM and Foundation Models
☆258Updated last year
biomedical-cybernetics / Relative-importance-and-activation-pruning
☆52Updated last year
pprp / Awesome-LLM-Prune
Awesome list for LLM pruning.
☆272Updated last month
TianjinYellow / EdgeDeviceLLMCompetition-Starting-Kit
☆43Updated last year
UbiquitousLearning / Paper-list-resource-efficient-large-language-model
☆101Updated last year
luuyin / OWL
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
☆74Updated 4 months ago
IST-DASLab / OBC
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
☆129Updated 2 years ago
tiingweii-shii / Awesome-Resource-Efficient-LLM-Papers
a curated list of high-quality papers on resource-efficient LLMs 🌱
☆148Updated 8 months ago
WoosukKwon / retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
☆192Updated 2 years ago
LiuXiaoxuanPKU / OSD
☆60Updated 11 months ago
henryzhongsc / longctx_bench
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…
☆86Updated 8 months ago
dilab-zju / self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆208Updated 9 months ago
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆67Updated 7 months ago
Xingrun-Xing2 / EfficientLLM
A family of efficient edge language models in 100M~1B sizes.
☆18Updated 9 months ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆128Updated 2 weeks ago
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆93Updated 2 years ago
SNU-ARC / any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆121Updated 4 months ago
ZO-Bench / ZO-LLM
[ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".
☆115Updated 4 months ago
txsun1997 / awesome-early-exiting
A curated list of Early Exiting papers, benchmarks, and misc.
☆119Updated 2 years ago
zyxxmu / DSnoT
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…
☆50Updated last year
hemingkx / Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆330Updated 6 months ago
AboveParadise / LLMCBench
☆25Updated 11 months ago
hmarkc / parallel-prompt-decoding
Efficient LLM Inference Acceleration using Prompting
☆51Updated last year
zyxxmu / cam
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆47Updated last year
raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆64Updated last year
ROIM1998 / APT
[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
☆45Updated last year
machilusZ / FastGen
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
☆41Updated last year