Hongcheng-Gao / HAVEN
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆15Updated this week
Alternatives and similar repositories for HAVEN
Users that are interested in HAVEN are comparing it to the libraries listed below
Sorting:
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆54Updated last week
- ☆75Updated 4 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 7 months ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…☆53Updated this week
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆46Updated 7 months ago
- Official implement of MIA-DPO☆57Updated 3 months ago
- (ICLR2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆34Updated 2 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆73Updated 11 months ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆51Updated 6 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆71Updated 8 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆74Updated 6 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆49Updated 6 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆54Updated last month
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆107Updated 3 weeks ago
- A Self-Training Framework for Vision-Language Reasoning☆78Updated 3 months ago
- 🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.☆40Updated 2 weeks ago
- ☆23Updated 11 months ago
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆43Updated 5 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆66Updated 11 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆74Updated 5 months ago
- [NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector☆36Updated last year
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆78Updated 2 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆47Updated 2 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆56Updated 10 months ago
- Code release for VTW (AAAI 2025) Oral☆39Updated 4 months ago
- ☆44Updated last week
- ☆54Updated last year
- [ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding☆78Updated last month
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆46Updated 5 months ago