double125/MADTP

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/double125/MADTP)

double125 / MADTP

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

☆50

Alternatives and similar repositories for MADTP

Users that are interested in MADTP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SiyuanHuang95 / SUG
View on GitHub
[ACM MM23] Pytorch implementation for paper: SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification
☆12Jul 4, 2023Updated 3 years ago
42Shawn / LLaVA-PruMerge
View on GitHub
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆173Mar 8, 2026Updated 4 months ago
orailix / PACT
View on GitHub
[CVPR 2025] PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
☆60Jan 30, 2026Updated 5 months ago
ch3cook-fdu / Vote2Cap-DETR
View on GitHub
[T-PAMI 2024] & [CVPR 2023] Vote2Cap-DETR; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning met…
☆104Aug 17, 2024Updated last year
ChenAnno / SPIRIT_TOMM2024
View on GitHub
Official implementation for "SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback"
☆16Oct 27, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ChenAnno / FashionERN_AAAI2024
View on GitHub
Official implementation for "FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval"
☆20Oct 27, 2025Updated 8 months ago
lzhxmu / VTW
View on GitHub
Code release for VTW (AAAI 2025 Oral)
☆68Nov 4, 2025Updated 8 months ago
hulianyuyy / iLLaVA
View on GitHub
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)
☆23Jun 24, 2026Updated last month
Pter61 / context-i2w
View on GitHub
Context-I2W: Mapping Images to Context-dependent words for Accurate Zero-Shot Composed Image Retrieval [AAAI 2024 Oral]
☆54May 27, 2025Updated last year
ChenAnno / Real20M_ACMMM2023
View on GitHub
Official implementation for "Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval"
☆25Oct 27, 2025Updated 8 months ago
FarinaMatteo / multiflow
View on GitHub
[CVPR '24] Official implementation of the paper "Multiflow: Shifting Towards Task-Agnostic Vision-Language Pruning".
☆24Mar 7, 2025Updated last year
xuyang-liu16 / MixKV
View on GitHub
[ICLR 2026] Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
☆29Mar 21, 2026Updated 4 months ago
OpenGVLab / MMIU
View on GitHub
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆98Sep 14, 2024Updated last year
TerryPei / CSP
View on GitHub
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
☆10Dec 15, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated 2 years ago
hfutqian / AdaDFQ
View on GitHub
☆22Oct 27, 2024Updated last year
Theia-4869 / VisPruner
View on GitHub
[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
☆84Jul 1, 2025Updated last year
merlresearch / SOCKET
View on GitHub
Code for MERL's ECCV 2022 paper on Cross-Modal Knowledge Transfer Without Task-Relevant Source Data
☆11Jul 19, 2022Updated 4 years ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
View on GitHub
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆33May 16, 2024Updated 2 years ago
Sunshine-Ye / NIPS22-ST
View on GitHub
☆12Oct 24, 2024Updated last year
PeiZhou26 / MaxMI
View on GitHub
A Maximal Mutual Information Criterion for Manipulation Concept Discovery
☆14Sep 26, 2024Updated last year
sdc17 / UPop
View on GitHub
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
☆103Dec 30, 2024Updated last year
Yuhan-Shen / VisualNarrationProceL-CVPR21
View on GitHub
☆15May 23, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
PKU-ICST-MIPL / TARA_CVPR2026
View on GitHub
☆17Mar 21, 2026Updated 4 months ago
pixeli99 / OWS
View on GitHub
Official Pytorch Implementation of "Outlier-weighed Layerwise Sampling for LLM Fine-tuning" by Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei …
☆35Jun 3, 2025Updated last year
mlvlab / MCTF
View on GitHub
Official implementation of CVPR 2024 paper "Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers".
☆40Jul 30, 2025Updated 11 months ago
fukunyin / CoCo-NeRF
View on GitHub
Coordinates are not lonely - Codebook Prior Helps Implicit Neural 3D Representations
☆52Feb 15, 2023Updated 3 years ago
daiqing98 / The-Photographers-Eye
View on GitHub
The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers
☆19Dec 12, 2025Updated 7 months ago
mlvlab / vid-TLDR
View on GitHub
Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".
☆55Oct 21, 2025Updated 9 months ago
mengchuang123 / VASparse-github
View on GitHub
[CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
☆50Mar 24, 2025Updated last year
Jhonve / ImplicitPCDA
View on GitHub
Domain Adaptation on Point Clouds via Geometry-Aware Implicits
☆26Sep 7, 2023Updated 2 years ago
daixiangzi / Awesome-Token-Compress
View on GitHub
A paper list of some recent works about Token Compress for Vit and VLM
☆944Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hasanar1f / HiRED
View on GitHub
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆58Apr 18, 2025Updated last year
mlvlab / drone_ai_challenge
View on GitHub
2021 Drone AI challenge
☆16Jan 4, 2022Updated 4 years ago
LesterGong / MMRB
View on GitHub
The official repository of paper "Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark"
☆19Jun 20, 2025Updated last year
alibaba / EfficientAI
View on GitHub
☆48May 9, 2026Updated 2 months ago
jiquan123 / TIER
View on GitHub
TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment
☆10Mar 1, 2025Updated last year
merantix / acosp
View on GitHub
Semantic Segmentation in Pytorch
☆10Dec 9, 2022Updated 3 years ago
SUSTechBruce / LOOK-M
View on GitHub
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆103Nov 9, 2024Updated last year