Jyxarthur/shot-by-shot

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Jyxarthur/shot-by-shot)

Jyxarthur / shot-by-shot

[ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Eshika Khandelwal, Gül Varol, Weidi Xie, Andrew Zisserman

☆24

Alternatives and similar repositories for shot-by-shot

Users that are interested in shot-by-shot are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Jyxarthur / AutoAD-Zero
View on GitHub
[ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, M…
☆31May 16, 2026Updated 2 months ago
PardoAlejo / MovieCuts
View on GitHub
Learning to cut end-to-end pretrained modules
☆38Apr 17, 2025Updated last year
cfeng16 / GPS2Pix
View on GitHub
[CVPR 2025] GPS as a Control Signal for Image Generation
☆25Mar 18, 2025Updated last year
GaryJiajia / OFv2_ICL_VQA
View on GitHub
[CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering
☆21May 28, 2025Updated last year
mininglamp-MLLM / HMLLM
View on GitHub
[ACM MM2024] The code for HMLLM.
☆11Oct 27, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
wenyu1009 / RTSRN
View on GitHub
☆20Sep 19, 2023Updated 2 years ago
GXYM / VCapsBench
View on GitHub
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
☆20Jun 2, 2025Updated last year
yeliudev / nncore
View on GitHub
📦 A lightweight machine learning toolkit for researchers, providing common model design & learning functionalities.
☆29Jul 9, 2026Updated 2 weeks ago
SilentView / EMCID
View on GitHub
Official Implementation for "Editing Massive Concepts in Text-to-Image Diffusion Models"
☆19Mar 21, 2024Updated 2 years ago
ljzycmd / SCD
View on GitHub
Consistent Human Image and Video Generation with Spatially Conditioned Diffusion
☆16Sep 1, 2025Updated 10 months ago
ForJadeForest / Lever-LM
View on GitHub
The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
☆18Oct 4, 2024Updated last year
NarcissusEx / VividDreamer
View on GitHub
☆17Feb 20, 2025Updated last year
google-deepmind / wyd-benchmark
View on GitHub
☆28Mar 3, 2025Updated last year
yongliang-wu / ExploreCfg
View on GitHub
[NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning
☆47Nov 26, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
VidCapBench / VidCapBench
View on GitHub
☆13May 17, 2025Updated last year
runjiali-rl / threestudio-dreambeast
View on GitHub
[3DV 2025]🐱🐶🐲🐮🐷Official Implementation of DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer
☆67Mar 20, 2025Updated last year
zhentao-zou / MURE
View on GitHub
Beyond Textual CoT: Interleaved Text-image chains with Deep Confidence Reasoning for Image Editing
☆19Jun 24, 2026Updated last month
HDUyiming / SOCCER
View on GitHub
We are very happy that our work has been accepted by ACM Multimedia 2024！🥰
☆12Jan 8, 2025Updated last year
adobe-research / llava-score
View on GitHub
☆11Oct 2, 2024Updated last year
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
nishadsinghi / sc-genrm-scaling
View on GitHub
[COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…
☆15Oct 31, 2025Updated 8 months ago
Vchitect / ShotBench
View on GitHub
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
☆102Sep 12, 2025Updated 10 months ago
Hoar012 / TDC-Video
View on GitHub
Official implementation of TDC.
☆15Jul 22, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
CCIIPLab / DPT
View on GitHub
The code of IJCAI2022 paper, Declaration-based Prompt Tuning for Visual Question Answering
☆20May 10, 2022Updated 4 years ago
Jiahao000 / VICT
View on GitHub
[CVPR 2025] Test-Time Visual In-Context Tuning
☆30Dec 31, 2025Updated 6 months ago
UCSB-AI / via-video
View on GitHub
☆25May 12, 2026Updated 2 months ago
FrankYang-17 / Mavors
View on GitHub
☆16May 30, 2025Updated last year
Pan-xiaokai / DomainPlus-Network
View on GitHub
DomainPlus: Cross-Transform Domain Learning towards High Dynamic Range Imaging
☆12Oct 11, 2022Updated 3 years ago
LgQu / TIGeR
View on GitHub
Code for paper: Unified Text-to-Image Generation and Retrieval
☆16Jul 19, 2026Updated last week
Gaiejj / align-anything
View on GitHub
☆16Nov 11, 2025Updated 8 months ago
tsb0601 / MultiMon
View on GitHub
☆25Jun 22, 2023Updated 3 years ago
dawitmureja / AVE
View on GitHub
This is the official repository for our ECCV 2022 paper titled, "The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assis…
☆53Nov 28, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
notwaldorf / old-research-papers
View on GitHub
Old Reinforcement Learning research from university
☆10Jan 4, 2017Updated 9 years ago
gemlab-vt / motionshop
View on GitHub
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
☆26Dec 12, 2024Updated last year
JustinYuu / MM_Pyramid
View on GitHub
[ACM MM 2022] MM_Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
☆15Aug 26, 2022Updated 3 years ago
llyx97 / video_reason_bench
View on GitHub
[ICLR 2026] "VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?", Yuanxin Liu, Kun Ouyang, Haoning Wu, Yi Liu, L…
☆41Jan 30, 2026Updated 5 months ago
shuoyang129 / eamat
View on GitHub
Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)
☆12Oct 11, 2022Updated 3 years ago
fyyCS / LSLD
View on GitHub
☆14Nov 13, 2023Updated 2 years ago
YuanJianhao508 / LikePhys
View on GitHub
[ICLR2026] LikePhys, a training-free method that evaluates intuitive physics in video diffusion models by distinguishing physically valid…
☆16Mar 5, 2026Updated 4 months ago