zli12321 / VideoHalluLinks
Synthetic Video hallucination and Mitigation
☆18Updated 4 months ago
Alternatives and similar repositories for VideoHallu
Users that are interested in VideoHallu are comparing it to the libraries listed below
Sorting:
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,349Updated 2 months ago
- A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.☆519Updated this week
- Visualizing the attention of vision-language models☆279Updated 11 months ago
- Codes for paper "SafeAgentBench: A Benchmark for Safe Task Planning of \\ Embodied LLM Agents"☆63Updated 11 months ago
- Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS D&B 2024 Oral)☆278Updated 11 months ago
- Focused on the safety and security of Embodied AI☆96Updated last month
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆1,313Updated last week
- ☆15Updated 10 months ago
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆975Updated 4 months ago
- Benchmarking Physical Risk Awareness of Foundation Model-based Embodied AI Agents☆23Updated last year
- up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources☆262Updated 4 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆340Updated 9 months ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆956Updated 2 months ago
- A flexible and efficient codebase for training visually-conditioned language models (VLMs)☆917Updated last year
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆405Updated last year
- 2025 CCF BDCI DeepSearch 赛道 Top 方案☆89Updated last month
- MemoryEQA☆23Updated 2 months ago
- [NeurIPS 2025 Spotlight] Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning.☆118Updated 3 weeks ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆423Updated last year
- Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.☆540Updated 2 months ago
- [EMNLP 2024 Main] Official implementation of the paper "To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimoda…☆17Updated last year
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.☆385Updated last year
- [AAAI 2026] Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks☆40Updated 2 months ago
- Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information☆22Updated 9 months ago
- Summaries of ICML 2024 papers☆12Updated last year
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆816Updated last month
- Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)☆100Updated last year
- Training VLM agents with multi-turn reinforcement learning☆391Updated last week
- ICLR 2025 Agent-Related Papers☆75Updated last year
- 🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.☆137Updated 3 weeks ago