yuezih / Movie101Links
Narrative movie understanding benchmark
β70Updated last year
Alternatives and similar repositories for Movie101
Users that are interested in Movie101 are comparing it to the libraries listed below
Sorting:
- π R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)β83Updated 11 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ82Updated last month
- β74Updated 6 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequencesβ39Updated 2 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β115Updated 2 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modelingβ97Updated 4 months ago
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrievalβ38Updated 2 years ago
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ48Updated 10 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understandingβ51Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMsβ47Updated 2 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videosβ43Updated last year
- [ACL2023] VSTAR is a multimodal dialogue dataset with scene and topic transition informationβ12Updated 7 months ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.β71Updated 7 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Groundingβ54Updated 11 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistantβ60Updated 11 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understandingβ34Updated 2 months ago
- A Large-scale Dataset for training and evaluating model's ability on Dense Text Image Generationβ69Updated 3 months ago
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)β101Updated 4 months ago
- β72Updated last year
- [ECCVβ24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarioβ¦β53Updated 9 months ago
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Pβ¦β25Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehensionβ26Updated last year
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignmentβ52Updated last year
- R1-like Video-LLM for Temporal Groundingβ92Updated last week
- [CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detectionβ94Updated 10 months ago
- LMM solved catastrophic forgetting, AAAI2025β43Updated last month
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)β40Updated 2 years ago
- Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understandingβ61Updated last month
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, β¦β101Updated 5 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?β59Updated 2 months ago