[ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
☆28Jan 28, 2025Updated last year
Alternatives and similar repositories for AutoAD-Zero
Users that are interested in AutoAD-Zero are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda H…☆21Jul 26, 2025Updated 7 months ago
- [CVPR'23 Highlight] AutoAD: Movie Description in Context.☆103Nov 6, 2024Updated last year
- [CVPR'22 Oral] Temporal Alignment Networks for Long-term Video. Tengda Han, Weidi Xie, Andrew Zisserman.☆119Oct 9, 2023Updated 2 years ago
- Website-based resource monitor for Slurm system☆37Apr 6, 2023Updated 2 years ago
- This dataset is presented in the paper Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video…☆12Sep 21, 2022Updated 3 years ago
- ☆16Jan 6, 2026Updated last month
- ☆20Dec 29, 2024Updated last year
- Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)☆54Apr 15, 2024Updated last year
- [ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing☆27Jul 15, 2022Updated 3 years ago
- ☆24Mar 30, 2024Updated last year
- Content for cloud computing workshop☆15Apr 20, 2018Updated 7 years ago
- PostgreSQL extension for vector search, embeddings, and ML, plus NeuronAgent runtime and NeuronMCP server.☆41Feb 3, 2026Updated last month
- Watch, read and lookup: learning to spot signs from multiple supervisors, ACCV 2020 (Best Application Paper)☆33Apr 10, 2023Updated 2 years ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32May 15, 2023Updated 2 years ago
- Narrative movie understanding benchmark☆76Jun 11, 2025Updated 8 months ago
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)☆70Jan 4, 2026Updated 2 months ago
- [NeurIPS 2022] Segmenting Moving Objects via an Object-Centric Representation. Junyu Xie, Weidi Xie, Andrew Zisserman.☆32Dec 20, 2023Updated 2 years ago
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation☆34Mar 7, 2025Updated 11 months ago
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Nov 7, 2023Updated 2 years ago
- Beyond Accuracy: What Matters in Designing Well-Behaved Models?☆18Dec 9, 2025Updated 2 months ago
- [NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.☆289Oct 10, 2021Updated 4 years ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆83Jul 1, 2024Updated last year
- A tiny, didactical implementation of LLAMA 3☆42Dec 2, 2024Updated last year
- ☆14Aug 28, 2024Updated last year
- Orchestration middleware for Home Assistant + Ollama: enables 8-20B models to handle complex multi-intent commands through intelligent ta…☆23Feb 6, 2026Updated 3 weeks ago
- Collection of useful FFMPEG commands for processing audio and video files.☆44Jan 29, 2019Updated 7 years ago
- Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"☆19Jun 2, 2025Updated 9 months ago
- This is an official pytorch implementation of Learning To Recognize Procedural Activities with Distant Supervision. In this repository, w…☆43Feb 21, 2023Updated 3 years ago
- Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)☆40Oct 2, 2022Updated 3 years ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆82Oct 15, 2025Updated 4 months ago
- 🕵️♂️🔊 Automatically update Audio Deepfake Detection (ADD) papers daily using GitHub Actions (updates every 12 hours)☆17Feb 13, 2026Updated 2 weeks ago
- ☆10Oct 20, 2022Updated 3 years ago
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated 2 years ago
- Speech Security and Privacy Compendium - Mini☆10Jun 18, 2024Updated last year
- ☆10Feb 19, 2021Updated 5 years ago
- ☆14Sep 17, 2024Updated last year
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆13Jan 14, 2026Updated last month
- Object detection using Pascal's datasets and using pbtxt to denote dictionary of classes.☆10Mar 15, 2018Updated 7 years ago
- Fine-tuning Llama2-7b and other llms for categorising emails for Deutsche Bahn (German National Railways)☆13Oct 9, 2023Updated 2 years ago