alibaba / alimama-video-narrator
Research code for ACL2024 paper: "Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline"
☆31Updated 4 months ago
Alternatives and similar repositories for alimama-video-narrator:
Users that are interested in alimama-video-narrator are comparing it to the libraries listed below
- ☆143Updated 3 months ago
- ☆186Updated 9 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆132Updated 3 months ago
- ☆74Updated last month
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆49Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆111Updated last month
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆100Updated 2 weeks ago
- A Large-scale Dataset for training and evaluating model's ability on Dense Text Image Generation☆68Updated 2 months ago
- ☆133Updated last year
- 🔥🔥First-ever hour scale video understanding models☆309Updated 2 weeks ago
- ☆63Updated 8 months ago
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies 🌈☆43Updated 3 weeks ago
- The HD-VG-130M Dataset☆117Updated last year
- Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"☆121Updated 6 months ago
- Narrative movie understanding benchmark☆70Updated 11 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆54Updated 6 months ago
- ☆176Updated 10 months ago
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆103Updated last month
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Updated last year
- Supercharged BLIP-2 that can handle videos☆117Updated last year
- LVBench: An Extreme Long Video Understanding Benchmark☆89Updated 8 months ago
- Official implementation of the paper "Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Vi…☆172Updated last month
- ☆145Updated 6 months ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆88Updated 2 months ago
- ☆10Updated last year
- Official repository of MMDU dataset☆89Updated 7 months ago
- ☆71Updated 5 months ago
- Long Context Transfer from Language to Vision☆374Updated last month
- ☆103Updated 10 months ago
- Video dataset dedicated to portrait-mode video recognition.☆48Updated 4 months ago