Hongcheng-Gao / HAVENView external linksLinks
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆23Oct 22, 2025Updated 3 months ago
Alternatives and similar repositories for HAVEN
Users that are interested in HAVEN are comparing it to the libraries listed below
Sorting:
- ☆10Aug 10, 2024Updated last year
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆13Jun 28, 2025Updated 7 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- ☆33Apr 22, 2025Updated 9 months ago
- ☆13Apr 23, 2025Updated 9 months ago
- [ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?☆20Oct 20, 2025Updated 3 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆22Dec 8, 2024Updated last year
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆19Feb 14, 2025Updated last year
- This repository contains code and datasets for our paper on the effects of document multiplicity while the context size is fixed in Retri…☆18Mar 13, 2025Updated 11 months ago
- Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"☆21Feb 17, 2025Updated 11 months ago
- ☆23Jul 2, 2025Updated 7 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆42Jun 10, 2025Updated 8 months ago
- Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation: A framework for generating multimodal music by bridging dif…☆28Jan 21, 2025Updated last year
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆85Nov 10, 2024Updated last year
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆39Oct 14, 2025Updated 4 months ago
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆61Jul 16, 2024Updated last year
- ☆34Oct 9, 2025Updated 4 months ago
- ☆46Jun 24, 2025Updated 7 months ago
- ☆77Jan 22, 2026Updated 3 weeks ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆46Jul 17, 2025Updated 6 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- Official implementation of "URECA : Unique Region Caption Anything"☆56Jul 13, 2025Updated 7 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 8, 2026Updated last week
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆57Oct 28, 2024Updated last year
- VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆61Jan 9, 2026Updated last month
- We introduce CausalVQA, a benchmark dataset for video question answering (VQA) composed of question-answer pairs that probe models’ under…☆52Aug 18, 2025Updated 5 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Jul 15, 2025Updated 7 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆57Jan 23, 2026Updated 3 weeks ago
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆52Dec 7, 2025Updated 2 months ago
- ☆54Feb 7, 2026Updated last week
- HallE-Control: Controlling Object Hallucination in LMMs☆31Apr 10, 2024Updated last year
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).☆41May 22, 2025Updated 8 months ago
- LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling☆187Jan 26, 2026Updated 2 weeks ago
- Official PyTorch implementation of "CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning" @ ICCV 2023☆39Oct 16, 2025Updated 3 months ago
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆38Jan 27, 2026Updated 2 weeks ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Jun 7, 2024Updated last year
- AgenTracer: A Lightweight Failure Attributor for Agentic Systems☆75Nov 12, 2025Updated 3 months ago