shuyansy / MLLM-Semantic-HallucinationLinks
π₯π₯[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning
β17Updated 2 weeks ago
Alternatives and similar repositories for MLLM-Semantic-Hallucination
Users that are interested in MLLM-Semantic-Hallucination are comparing it to the libraries listed below
Sorting:
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoningβ103Updated 4 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Modelsβ86Updated last year
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaborationβ24Updated 11 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β189Updated 3 months ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Explorationβ57Updated last month
- Official implement of MIA-DPOβ66Updated 8 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidationβ16Updated 7 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoningβ18Updated 5 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalizationβ71Updated 2 months ago
- [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"β92Updated 2 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". β¦β58Updated 11 months ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsificationβ37Updated 6 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-trainingβ87Updated 2 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ57Updated 3 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Rewardβ82Updated 2 months ago
- Code for Retrieval-Augmented Perception οΌICML 2025)β57Updated 2 months ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuningβ67Updated 3 weeks ago
- [NeurIPS 2024] Official Code for the Paper "Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning"β24Updated 6 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ218Updated last month
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β193Updated 2 months ago
- [CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documentsβ41Updated 4 months ago
- π Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Modelsβ33Updated 2 months ago
- β51Updated 6 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generationβ67Updated 3 weeks ago
- β35Updated last month
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"β40Updated 3 months ago
- β92Updated 9 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Modelsβ135Updated 5 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Groundingβ66Updated 4 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Contextβ166Updated last year