bytedance / Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆129Updated 2 months ago
Alternatives and similar repositories for Shot2Story:
Users that are interested in Shot2Story are comparing it to the libraries listed below
- ☆183Updated 9 months ago
- ☆175Updated 9 months ago
- ☆142Updated 3 months ago
- ☆73Updated last month
- Long Context Transfer from Language to Vision☆372Updated last month
- Code repository for T2V-Turbo and T2V-Turbo-v2☆296Updated 2 months ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆177Updated 3 months ago
- [AAAI 2025] Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion …☆158Updated last year
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆122Updated 5 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆143Updated 6 months ago
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation☆279Updated last month
- ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation (TMLR 2024)☆242Updated 9 months ago
- Supercharged BLIP-2 that can handle videos☆117Updated last year
- Multimodal Models in Real World☆492Updated last month
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆134Updated last year
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆431Updated 7 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆127Updated 5 months ago
- The HD-VG-130M Dataset☆117Updated last year
- ☆368Updated last month
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆110Updated 2 weeks ago
- Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models (ICLR 2024)☆140Updated 11 months ago
- [ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper☆149Updated 11 months ago
- [IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort☆150Updated 4 months ago
- [ECCV 2024] Official PyTorch implementation of "Getting it Right: Improving Spatial Consistency in Text-to-Image Models"☆99Updated 9 months ago
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆103Updated 3 weeks ago
- [NeurIPS 2024 Spotlight] The official implement of research paper "MotionBooth: Motion-Aware Customized Text-to-Video Generation"☆130Updated 6 months ago
- Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope…☆266Updated last month
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆129Updated last year
- ☆132Updated 6 months ago
- ☆171Updated last year