repo for active speaker detection for media videos.
☆31Nov 19, 2023Updated 2 years ago
Alternatives and similar repositories for movie-asd
Users that are interested in movie-asd are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Evaluation script for VoxMovies dataset in PyTorch☆23Jan 12, 2024Updated 2 years ago
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆170Mar 23, 2025Updated last year
- INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues☆58May 29, 2023Updated 2 years ago
- Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset☆72Jan 18, 2022Updated 4 years ago
- ☆23Nov 17, 2025Updated 4 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [WACV 2026] LASER: Lip Landmark Assisted Speaker Detection for Robustness official implemntation☆24Feb 26, 2026Updated last month
- ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'☆462Oct 23, 2023Updated 2 years ago
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.☆59Jan 24, 2024Updated 2 years ago
- [ICASSP 2024] Official code for FreGrad☆35May 13, 2024Updated last year
- Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"☆115Nov 16, 2020Updated 5 years ago
- ☆10Dec 22, 2023Updated 2 years ago
- ☆14Feb 22, 2025Updated last year
- Accepted by TMM 2022☆19Aug 18, 2022Updated 3 years ago
- [TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation☆31Sep 6, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Datasets of audio adversarial examples for deep speech recognition systems and Python code of a detection system☆13May 6, 2023Updated 2 years ago
- Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)☆68Oct 29, 2023Updated 2 years ago
- Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering☆11Feb 16, 2023Updated 3 years ago
- (ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation☆15Apr 29, 2025Updated 11 months ago
- Official code for the paper "GestSync: Determining who is speaking without a talking head" published at BMVC 2023☆47Sep 1, 2024Updated last year
- We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enh…☆17Dec 31, 2024Updated last year
- [NeurIPS2023] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering☆13Jan 5, 2024Updated 2 years ago
- A library for exporting models including NeMo and Hugging Face to optimized inference backends, and deploying them for efficient querying☆32Updated this week
- ☆83Mar 10, 2025Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆21Nov 19, 2024Updated last year
- Visual Speech Recognition for Multiple Languages☆461Aug 17, 2023Updated 2 years ago
- Python library for finding similar content in videos.☆16Nov 29, 2023Updated 2 years ago
- The implementation of FINER-MLLM, which is accepted by MM2024.☆18Oct 8, 2024Updated last year
- PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…☆21Apr 3, 2024Updated last year
- A curated list of Story Ending Generation models; DASFAA'22: Incorporating Commonsense Knowledge into Story Ending Generation via Heterog…☆14May 12, 2022Updated 3 years ago
- Kaggle Cats vs. Dogs Redux Edition☆21Mar 11, 2017Updated 9 years ago
- Multimodal Variational Auto-encoder based Audio-Visual Segmentation [ICCV2023].☆20Sep 19, 2024Updated last year
- Rust standalone inference of Namo-500M series models. Extremly tiny, runing VLM on CPU.☆24Mar 12, 2025Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind (AAAI2025)☆19Apr 16, 2025Updated 11 months ago
- ☆13Aug 14, 2022Updated 3 years ago
- ☆19Sep 4, 2023Updated 2 years ago
- ☆20Oct 9, 2020Updated 5 years ago
- Check the source code—this is how we can create a rounded popup in a Chrome extension.☆15May 15, 2025Updated 10 months ago
- A tool for generating python inference pipeline☆50Mar 23, 2026Updated last week
- ☆24Sep 20, 2024Updated last year