THUNLP-MT / CODIS
Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".
☆11Updated 7 months ago
Alternatives and similar repositories for CODIS
Users that are interested in CODIS are comparing it to the libraries listed below
Sorting:
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆46Updated 7 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆49Updated 6 months ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆13Updated 9 months ago
- ☆11Updated 3 months ago
- Official Code of IdealGPT☆35Updated last year
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆22Updated 11 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆58Updated last year
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆74Updated 6 months ago
- ☆25Updated 6 months ago
- ☆73Updated 11 months ago
- [Arxiv Paper 2504.09130]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search☆16Updated 3 weeks ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆54Updated last month
- A Comprehensive Benchmark for Robust Multi-image Understanding☆11Updated 8 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆73Updated 11 months ago
- ☆18Updated 10 months ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆34Updated last week
- ☆18Updated 2 weeks ago
- Counterfactual Reasoning VQA Dataset☆25Updated last year
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆66Updated 11 months ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…☆53Updated this week
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆54Updated last week
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆94Updated last year
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding☆29Updated 2 weeks ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 6 months ago
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆43Updated 5 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆74Updated 5 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆45Updated 10 months ago
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"☆32Updated 10 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆48Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆56Updated 6 months ago