Nicous20 / FunQA
FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, and beyond.
☆94Updated 2 months ago
Related projects: ⓘ
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆75Updated 3 weeks ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆80Updated 2 months ago
- ☆70Updated 4 months ago
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆175Updated 8 months ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆104Updated last week
- Official repo for StableLLAVA☆90Updated 9 months ago
- ☆110Updated 4 months ago
- ☆101Updated 5 months ago
- ☆128Updated 9 months ago
- ☆53Updated 7 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆73Updated 2 months ago
- Official Dataloader and Evaluation Scripts for LongVideoBench.☆52Updated last month
- Official repository of MMDU dataset☆61Updated last month
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆67Updated 5 months ago
- ☆100Updated last year
- TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering☆132Updated 4 months ago
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆114Updated 8 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆46Updated last year
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆39Updated last week
- ☆80Updated 4 months ago
- ☆43Updated 2 months ago
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆47Updated 6 months ago
- ☆28Updated this week
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆33Updated 4 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆100Updated 2 months ago
- ☆99Updated last week
- ☆52Updated 4 months ago
- ☆83Updated 9 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated last month
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆112Updated 2 months ago