THU-MIG / VTC-CLSLinks

official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"

☆23

Alternatives and similar repositories for VTC-CLS

Users that are interested in VTC-CLS are comparing it to the libraries listed below

Sorting:

yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆59Updated last year
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 8 months ago
zycheiheihei / Transferable-Visual-Prompting
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…
☆46Updated 11 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 5 months ago
Kwai-YuanQi / TaskGalaxy
Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
☆32Updated 4 months ago
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆50Updated last year
tripletclip / TripletCLIP
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆45Updated 11 months ago
maifoundations / Visionary-R1
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
☆41Updated 4 months ago
mlvlab / RALF
Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
☆44Updated last year
locuslab / llava-token-compression
☆44Updated last year
tian1327 / SWAT
[CVPR 2025] Few-shot Recognition via Stage-Wise Retrieval-Augmented Finetuning
☆27Updated 2 months ago
OpenSparseLLMs / CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
☆50Updated last year
LaVi-Lab / AIM
[ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"
☆44Updated last month
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆57Updated last year
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 9 months ago
alibaba / conv-llava
☆123Updated last year
Yaxin9Luo / Gamma-MOD
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆40Updated 3 weeks ago
XMUDeepLIT / AVG-LLaVA
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆33Updated last year
FreedomIntelligence / TRIM
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆15Updated 11 months ago
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆82Updated 8 months ago
rui-qian / READ
Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)
☆48Updated last month
OoDBag / VisTA
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
☆20Updated 5 months ago
HKUST-LongGroup / CoMM
Official repository for CoMM Dataset
☆48Updated 10 months ago
JiuTian-VL / MoME
[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆74Updated 6 months ago
iancovert / locality-alignment
☆53Updated 10 months ago
yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆106Updated last year
foundation-multimodal-models / CAPTURE
☆79Updated last year
adobe-research / llava-score
☆11Updated last year
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆49Updated 5 months ago
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆80Updated 3 weeks ago