csarron / PuMer
[ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
☆29Updated 5 months ago
Alternatives and similar repositories for PuMer:
Users that are interested in PuMer are comparing it to the libraries listed below
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆13Updated 3 months ago
- ☆39Updated 4 months ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆32Updated last year
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 3 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆16Updated 5 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Adapting LLaMA Decoder to Vision Transformer☆28Updated 10 months ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated last year
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆38Updated last year
- ☆23Updated 5 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆13Updated last month
- ☆24Updated last year
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆33Updated last month
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆48Updated 10 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆36Updated this week
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆48Updated 4 months ago
- ☆30Updated 2 years ago
- Official implementation of TagAlign☆34Updated 3 months ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆38Updated last year
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.☆32Updated 2 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆26Updated last month
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆51Updated 4 months ago
- An Enhanced CLIP Framework for Learning with Synthetic Captions☆28Updated 3 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆32Updated 9 months ago
- Official implementation for FlexAttention for Efficient High-Resolution Vision-Language Models☆37Updated 2 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated 2 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆20Updated 6 months ago
- Mixture of Attention Heads☆43Updated 2 years ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 6 months ago