bytedance / Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆98Updated last month
Related projects ⓘ
Alternatives and complementary repositories for Shot2Story
- ☆164Updated 3 months ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆126Updated 2 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆109Updated last month
- [ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper☆126Updated 6 months ago
- ☆126Updated last week
- [Arxiv 2024] Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion…☆146Updated 7 months ago
- ☆189Updated 3 months ago
- ☆145Updated 3 weeks ago
- ☆138Updated this week
- ☆210Updated 6 months ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆242Updated 10 months ago
- UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing☆90Updated this week
- Supercharged BLIP-2 that can handle videos☆116Updated 11 months ago
- ☆165Updated 4 months ago
- ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation (TMLR 2024)☆214Updated 4 months ago
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆117Updated last week
- ☆119Updated last month
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆83Updated 2 weeks ago
- ☆72Updated 5 months ago
- Long Context Transfer from Language to Vision☆328Updated 2 weeks ago
- Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding☆212Updated 2 months ago
- Multimodal Models in Real World☆400Updated last week
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆131Updated 9 months ago
- Official code for 'Paragraph-to-Image Generation with Information-Enriched Diffusion Model'☆94Updated 5 months ago
- official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (COLM 2024)☆167Updated 3 months ago
- Official repo for StableLLAVA☆90Updated 10 months ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆126Updated 9 months ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding☆240Updated 3 months ago
- ☆258Updated this week