shuyansy / MLLM-Semantic-HallucinationLinks

🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning

☆19

Alternatives and similar repositories for MLLM-Semantic-Hallucination

Users that are interested in MLLM-Semantic-Hallucination are comparing it to the libraries listed below

Sorting:

yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆59Updated last year
BAAI-DCAI / MMVU
☆52Updated 8 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆59Updated last year
Kevinz-code / SeVa
[MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501
☆59Updated last year
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 9 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆89Updated last year
findalexli / mllm-dpo
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆48Updated last year
RainBowLuoCS / DEEM
(ICLR 2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.
☆44Updated 4 months ago
DreamMr / RAP
Code for Retrieval-Augmented Perception （ICML 2025)
☆60Updated 3 months ago
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆57Updated last year
minglllli / CLS-RL
[NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
☆73Updated 2 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 5 months ago
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆50Updated last year
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆86Updated 3 months ago
UCSC-VLAA / VLAA-Thinking
[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆139Updated last month
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Updated 3 months ago
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆197Updated 4 months ago
FreedomIntelligence / TRIM
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆15Updated 11 months ago
zifuwan / ONLY
[ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
☆41Updated 4 months ago
MME-Benchmarks / MME-Unify
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆41Updated 7 months ago
zjunlp / Deco
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
☆120Updated 2 months ago
YuxiXie / V-DPO
Preference Learning for LLaVA
☆54Updated last year
XMUDeepLIT / AVG-LLaVA
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆33Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆166Updated 5 months ago
xuyang-liu16 / GlobalCom2
[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆34Updated 2 weeks ago
om-ai-lab / ZoomEye
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆61Updated 2 months ago
MikeWangWZHL / PAPO
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆96Updated 2 months ago
Brandon3964 / MultiModal-Task-Vector
[NeurIPS 2024] Official Code for the Paper "Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning"
☆24Updated 7 months ago
OpenSparseLLMs / CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
☆50Updated last year