bytedance / Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆131Updated 3 months ago
Alternatives and similar repositories for Shot2Story
Users that are interested in Shot2Story are comparing it to the libraries listed below
Sorting:
- ☆186Updated 10 months ago
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆124Updated 6 months ago
- ☆144Updated 3 months ago
- ☆176Updated 10 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆146Updated 7 months ago
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆103Updated last month
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆96Updated 3 weeks ago
- Code repository for T2V-Turbo and T2V-Turbo-v2☆299Updated 3 months ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆129Updated last year
- ☆75Updated 2 months ago
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation☆289Updated 2 months ago
- [ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper☆153Updated last year
- [AAAI 2025] Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion …☆158Updated last year
- [IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort☆150Updated 5 months ago
- Long Context Transfer from Language to Vision☆374Updated last month
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆109Updated 2 months ago
- Multimodal Models in Real World☆503Updated 2 months ago
- Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope…☆274Updated 2 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆438Updated 8 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆99Updated 3 weeks ago
- ☆171Updated last year
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆88Updated 3 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional image generation models. (ICLR 2024)☆168Updated 3 weeks ago
- Supercharged BLIP-2 that can handle videos☆117Updated last year
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆363Updated 2 weeks ago
- ☆63Updated 8 months ago
- Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"☆122Updated 6 months ago
- [ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video …☆124Updated last month
- [ICLR 2025] HQ-Edit: A High-Quality and High-Coverage Dataset for General Image Editing☆100Updated last year
- [CVPR2024] MotionEditor is the first diffusion-based model capable of video motion editing.☆167Updated 3 weeks ago