chengtan9907 / mc-cot
The official implementation of the ECCV'24 paper MC-CoT: Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training.
☆13Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for mc-cot
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆32Updated last year
- Official implementation of TagAlign☆32Updated 7 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆17Updated 2 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆17Updated 3 weeks ago
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆71Updated 7 months ago
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆37Updated 10 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆24Updated 2 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆28Updated last month
- ☆84Updated 11 months ago
- ☆16Updated 8 months ago
- ☆27Updated 2 weeks ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆48Updated last month
- FreeVA: Offline MLLM as Training-Free Video Assistant☆48Updated 5 months ago
- Turning to Video for Transcript Sorting☆46Updated last year
- [ACL 2024] Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models. Detect and mitigate object hallucinatio…☆16Updated 4 months ago
- Adapting LLaMA Decoder to Vision Transformer☆27Updated 5 months ago
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆32Updated 4 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆50Updated last month
- Code for paper "AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention"☆15Updated 3 months ago
- Making LLaVA Tiny via MoE-Knowledge Distillation☆55Updated 2 weeks ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆82Updated 11 months ago
- HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language M…☆13Updated 2 months ago
- [ICCV2023] Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm☆15Updated last year
- A project for tri-modal LLM benchmarking and instruction tuning.☆13Updated this week
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆77Updated last month
- The efficient tuning method for VLMs☆75Updated 8 months ago
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆43Updated 5 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆27Updated last month
- visual question answering prompting recipes for large vision-language models☆21Updated last month