sled-group / mohLinks

[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models

☆33

Alternatives and similar repositories for moh

Users that are interested in moh are comparing it to the libraries listed below

Sorting:

YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆81Updated last month
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Updated 3 months ago
TencentARC / GRPO-CARE
☆79Updated 5 months ago
orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆71Updated last year
si0wang / VisVM
☆46Updated 11 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 10 months ago
shiqichen17 / AdaptVis
Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)
☆61Updated 7 months ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆33Updated 9 months ago
showlab / UniRL
The code repository of UniRL
☆46Updated 6 months ago
tsunghan-wu / reverse_vlm
🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…
☆47Updated 2 months ago
yale-nlp / TOMATO
☆35Updated last year
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆70Updated last year
Mr-Loevan / FAST
Fast-Slow Thinking for Large Vision-Language Model Reasoning
☆21Updated 7 months ago
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆29Updated 4 months ago
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆197Updated 4 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆90Updated last year
Gabesarch / grounded-rl
☆107Updated 4 months ago
Aurora-slz / MM-Verify
☆15Updated last month
visual-haystacks / mirage
🔥 [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
☆22Updated 9 months ago
TencentARC / SEED-Bench-R1
☆94Updated 5 months ago
mll-lab-nu / TStar
TStar is a unified temporal search framework for long-form video question answering
☆73Updated 3 months ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆150Updated 2 months ago
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆71Updated last month
HKUST-LongGroup / CoMM
Official repository for CoMM Dataset
☆48Updated 11 months ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated last year
Cooperx521 / ScaleCap
Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆58Updated 5 months ago
amitakamath / whatsup_vlms
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
☆66Updated last year
ruili33 / TPO
☆39Updated 2 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 5 months ago
gyhdog99 / RACRO2
Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)
☆19Updated 5 months ago