yuhui-zh15 / AutoConverter
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 2025)
☆25Updated last week
Alternatives and similar repositories for AutoConverter:
Users that are interested in AutoConverter are comparing it to the libraries listed below
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 9 months ago
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆28Updated 5 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 8 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆70Updated 9 months ago
- ☆68Updated 2 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆80Updated 10 months ago
- ☆37Updated 2 months ago
- R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization☆63Updated this week
- Official Codebase for "Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers"☆13Updated this week
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆48Updated last month
- ☆48Updated 4 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆40Updated 3 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆77Updated 5 months ago
- Official implement of MIA-DPO☆54Updated 2 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆63Updated 6 months ago
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆35Updated 3 months ago
- ☆29Updated 7 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆74Updated 6 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 6 months ago
- ☆39Updated 4 months ago
- Language Repository for Long Video Understanding☆31Updated 9 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆33Updated this week
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆42Updated last year
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆35Updated 5 months ago
- ☆31Updated 8 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆65Updated last month
- ☆25Updated 8 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆67Updated 6 months ago
- ☆22Updated 4 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆37Updated 4 months ago