alibaba / alimama-video-narrator
Research code for ACL2024 paper: "Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline"
☆26Updated 2 months ago
Alternatives and similar repositories for alimama-video-narrator:
Users that are interested in alimama-video-narrator are comparing it to the libraries listed below
- ☆138Updated 2 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆124Updated last month
- ☆70Updated last week
- ☆180Updated 8 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆107Updated 3 weeks ago
- Official implementation of the paper "Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Vi…☆155Updated 4 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆48Updated last year
- ☆133Updated last year
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆79Updated 11 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆84Updated last month
- mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)☆223Updated last year
- LVBench: An Extreme Long Video Understanding Benchmark☆85Updated 6 months ago
- ☆61Updated 7 months ago
- VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation☆182Updated last month
- Video dataset dedicated to portrait-mode video recognition.☆44Updated 3 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆208Updated 8 months ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆138Updated 4 months ago
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆204Updated last year
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆136Updated 5 months ago
- The HD-VG-130M Dataset☆116Updated 11 months ago
- UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing☆105Updated 4 months ago
- [CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation☆49Updated this week
- ☆100Updated 8 months ago
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆52Updated last year
- ☆86Updated 8 months ago
- Official repository of MMDU dataset☆86Updated 5 months ago
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆293Updated last year
- ☆73Updated last year
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Updated last year