xuyang-liu16 / Awesome-Token-level-Model-CompressionLinks

📚 Collection of token-level model compression resources.

☆144

Alternatives and similar repositories for Awesome-Token-level-Model-Compression

Users that are interested in Awesome-Token-level-Model-Compression are comparing it to the libraries listed below

Sorting:

Gumpest / SparseVLMs
[ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".
☆135Updated 2 months ago
lzhxmu / VTW
Code release for VTW (AAAI 2025 Oral)
☆47Updated 2 weeks ago
ZichenWen1 / DART
Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"
☆64Updated 3 months ago
Theia-4869 / FasterVLM
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆84Updated last month
JinXins / Awesome-Token-Merge-for-MLLMs
A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.
☆69Updated 6 months ago
liuting20 / MustDrop
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
☆31Updated 6 months ago
ywh187 / FitPrune
☆54Updated 3 months ago
42Shawn / LLaVA-PruMerge
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆142Updated last month
Osilly / dynamic_llava
[ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…
☆48Updated 8 months ago
SUSTechBruce / LOOK-M
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆98Updated 8 months ago
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆117Updated 5 months ago
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆132Updated this week
ChangyuanWang17 / QVLM
[NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.
☆79Updated 7 months ago
thu-nics / FrameFusion
[ICCV'25] The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Langua…
☆51Updated this week
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆121Updated 3 weeks ago
fscdc / Awesome-Efficient-Reasoning-Models
[arXiv 2025] Efficient Reasoning Models: A Survey
☆247Updated 2 weeks ago
yczhou001 / Awesome-Diffusion-LLM
paper list, tutorial, and nano code snippet for Diffusion Large Language Models.
☆96Updated last month
ZichenWen1 / DIJA
Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"
☆53Updated last week
cokeshao / Awesome-Multimodal-Token-Compression
Survey: https://arxiv.org/pdf/2507.20198
☆69Updated this week
pkunlp-icler / FastV
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆468Updated 7 months ago
shufangxun / LLaVA-MoD
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
☆186Updated 4 months ago
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆180Updated last month
MAC-AutoML / QuoTA
This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehens…
☆73Updated 3 months ago
Adlith / MoE-Jetpack
[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
☆129Updated 8 months ago
zitian-gao / one-shot-em
One-shot Entropy Minimization
☆172Updated last month
hasanar1f / HiRED
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆40Updated 3 months ago
ZhangAIPI / YOPO_MLLM_Pruning
Pruning the VLLMs
☆99Updated 7 months ago
vbdi / divprune
[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
☆39Updated 2 months ago
ThreeSR / Awesome-Inference-Time-Scaling
Paper List of Inference/Test Time Scaling/Computing
☆286Updated last month
Clin0212 / HydraLoRA
[NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
☆220Updated 8 months ago