xuyang-liu16 / Awesome-Token-Reduction-for-Model-Compression

📚 Collection of token reduction for model compression resources.

☆47

Alternatives and similar repositories for Awesome-Token-Reduction-for-Model-Compression:

Users that are interested in Awesome-Token-Reduction-for-Model-Compression are comparing it to the libraries listed below

Gumpest / SparseVLMs
Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".
☆80Updated 2 weeks ago
thu-nics / FrameFusion
The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
☆37Updated last month
Theia-4869 / FasterVLM
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆57Updated 3 months ago
JinXins / Awesome-Token-Merge-for-MLLMs
A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.
☆44Updated 2 months ago
ZLKong / awesome-token-reduction
A collection of recent token reduction (token pruning, merging, clustering, etc.) techniques for ML/AI
☆27Updated this week
ChangyuanWang17 / QVLM
[NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.
☆67Updated 2 months ago
xuyang-liu16 / Awesome-Generation-Acceleration
📚 Collection of awesome generation acceleration resources.
☆179Updated 2 weeks ago
lzhxmu / VTW
Code release for VTW (AAAI 2025) Oral
☆32Updated 2 months ago
KD-TAO / DyCoke
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
☆35Updated this week
ZichenWen1 / DART
Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"
☆22Updated this week
SUSTechBruce / LOOK-M
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆92Updated 4 months ago
Purshow / Awesome-Unified-Multimodal
☆50Updated this week
42Shawn / LLaVA-PruMerge
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆119Updated 10 months ago
liuting20 / MustDrop
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
☆22Updated 2 months ago
OpenSparseLLMs / Skip-DiT
✈️ Accelerating Vision Diffusion Transformers with Skip Branches.
☆62Updated 3 months ago
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆78Updated last month
ThisisBillhe / ZipAR
This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"
☆46Updated 2 months ago
xuyang-liu16 / GlobalCom2
Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆16Updated this week
adreamwu / PTQ4DiT
PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005
☆26Updated 4 months ago
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆80Updated 2 weeks ago
ywh187 / FitPrune
☆40Updated 2 months ago
daixiangzi / Awesome-Token-Compress
A paper list of some recent works about Token Compress for Vit and VLM
☆377Updated 2 weeks ago
thu-nics / MBQ
The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"
☆35Updated last week
Hsu1023 / DuQuant
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆149Updated 5 months ago
horseee / learning-to-cache
[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
☆98Updated 8 months ago
NUS-HPC-AI-Lab / Dynamic-Tuning
The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"
☆43Updated 2 months ago
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆63Updated 11 months ago
MAC-AutoML / QuoTA
This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehens…
☆63Updated last week
shufangxun / LLaVA-MoD
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
☆116Updated 2 months ago
thu-nics / ViDiT-Q
[ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
☆67Updated this week