THU-MIG / VTC-CLS
official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"
☆14Updated 3 months ago
Alternatives and similar repositories for VTC-CLS:
Users that are interested in VTC-CLS are comparing it to the libraries listed below
- Official code for paper "Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models, ICML2024"☆24Updated 2 months ago
- Official Repository of Personalized Visual Instruct Tuning☆28Updated last month
- ☆12Updated last month
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆37Updated 4 months ago
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆39Updated 4 months ago
- Official implementation of MC-LLaVA.☆24Updated 2 months ago
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆35Updated 2 months ago
- ☆16Updated last year
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆33Updated 10 months ago
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆27Updated 2 weeks ago
- ☆11Updated 6 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆31Updated 2 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆37Updated last month
- Official implementation of TagAlign☆34Updated 4 months ago
- LEO: A powerful Hybrid Multimodal LLM☆17Updated 3 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆37Updated last year
- Official PyTorch Code for "Is Synthetic Data From Diffusion Models Ready for Knowledge Distillation?" (https://arxiv.org/abs/2305.12954)☆46Updated last year
- (CVPR 2024) "Unsegment Anything by Simulating Deformation"☆28Updated 10 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆26Updated 3 months ago
- [CVPR 2024 Highlight] ImageNet-D☆42Updated 6 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆52Updated 5 months ago
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆16Updated 6 months ago
- Adapting LLaMA Decoder to Vision Transformer☆28Updated 10 months ago
- ☆31Updated last year
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆68Updated 6 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆35Updated 2 months ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆23Updated 2 weeks ago
- [CVPR 2025] Few-shot Recognition via Stage-Wise Retrieval-Augmented Finetuning☆14Updated 2 weeks ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆34Updated this week
- ☆41Updated 5 months ago