bytedance / Shot2StoryLinks
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆132Updated 4 months ago
Alternatives and similar repositories for Shot2Story
Users that are interested in Shot2Story are comparing it to the libraries listed below
Sorting:
- ☆186Updated 10 months ago
- ☆175Updated this week
- ☆148Updated 4 months ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆181Updated 5 months ago
- ☆76Updated 2 months ago
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆103Updated 2 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆130Updated 6 months ago
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆100Updated last month
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆124Updated 6 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆150Updated 8 months ago
- Long Context Transfer from Language to Vision☆375Updated 2 months ago
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆136Updated last year
- ☆171Updated last year
- [ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper☆154Updated last year
- Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"☆127Updated 7 months ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆128Updated last year
- Code repository for T2V-Turbo and T2V-Turbo-v2☆300Updated 4 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆115Updated 2 months ago
- [ICML 2025] Official PyTorch implementation of LongVU☆378Updated 3 weeks ago
- ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation (TMLR 2024)☆240Updated 11 months ago
- Supercharged BLIP-2 that can handle videos☆118Updated last year
- [CVPR2024] MotionEditor is the first diffusion-based model capable of video motion editing.☆168Updated last month
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆88Updated 3 months ago
- [IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort☆149Updated 6 months ago
- [ICLR 2024] Code for FreeNoise based on VideoCrafter☆408Updated 10 months ago
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation☆298Updated this week
- ☆82Updated last year
- Multimodal Models in Real World☆510Updated 3 months ago
- Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope…☆278Updated 2 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆445Updated 9 months ago