yuhui-zh15 / AutoConverter
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 2025)
☆23Updated this week
Alternatives and similar repositories for AutoConverter:
Users that are interested in AutoConverter are comparing it to the libraries listed below
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆63Updated 9 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 8 months ago
- ☆66Updated 2 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆74Updated 5 months ago
- ☆38Updated 2 months ago
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆28Updated 4 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆40Updated 3 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆63Updated 6 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆80Updated 10 months ago
- ☆41Updated 4 months ago
- Official implement of MIA-DPO☆52Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆68Updated 9 months ago
- ☆29Updated 7 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆20Updated 3 months ago
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆47Updated last month
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 5 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆23Updated last month
- ☆29Updated 8 months ago
- ☆39Updated 4 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆28Updated 4 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆41Updated last year
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆18Updated last month
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆34Updated 4 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆33Updated 4 months ago
- ☆21Updated 4 months ago
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆35Updated 3 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆76Updated 4 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆82Updated last year