multimodal-art-projection / IV-BenchLinks

☆13

Alternatives and similar repositories for IV-Bench

Users that are interested in IV-Bench are comparing it to the libraries listed below

Sorting:

SihengLi99 / TextBind
[2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation
☆47Updated 2 years ago
facebookresearch / multimodal_rewardbench
Multimodal RewardBench
☆51Updated 7 months ago
zeyofu / ReFocus_Code
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆39Updated 2 months ago
MAmmoTH-VL / MAmmoTH-VL
(ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
☆48Updated 4 months ago
mlfoundations / VisIT-Bench
☆50Updated last year
YuxiXie / V-DPO
Preference Learning for LLaVA
☆51Updated 10 months ago
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆57Updated last year
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆48Updated 6 months ago
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]
☆77Updated 3 months ago
apple / ml-mia-bench
This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
☆31Updated 6 months ago
kaistAI / Volcano
[NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…
☆46Updated last year
Share14 / ShareGemini
☆31Updated last year
bcdnlp / FAITHSCORE
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆30Updated 6 months ago
declare-lab / LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
☆104Updated 7 months ago
lscpku / VITATECS
☆18Updated last year
TencentARC / GRPO-CARE
☆75Updated 3 months ago
WildVision-AI / WildVision-Bench
☆16Updated 11 months ago
HYPJUDY / Sparkles
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
☆44Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆88Updated last year
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆83Updated 8 months ago
Yangyi-Chen / CoTConsistency
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆34Updated 2 years ago
RifleZhang / LLaVA-Hound-DPO
☆153Updated 11 months ago
umd-huang-lab / Mementos
☆31Updated last year
ajd12342 / why-winoground-hard
Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022
☆31Updated 2 years ago
findalexli / mllm-dpo
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆47Updated 10 months ago
TIGER-AI-Lab / VISTA
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆19Updated 7 months ago
si0wang / VisVM
☆45Updated 9 months ago
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆29Updated 2 months ago
Lookuz / VidHal
Codebase for VidHal: Benchmarking Hallucinations in Vision LLMs
☆14Updated 5 months ago
si0wang / ViCrit
☆23Updated 3 months ago