repo for active speaker detection for media videos.
☆31Nov 19, 2023Updated 2 years ago
Alternatives and similar repositories for movie-asd
Users that are interested in movie-asd are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Graph learning framework for long-term video understanding☆72Jul 13, 2025Updated 11 months ago
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆177Mar 23, 2025Updated last year
- A Python script for AI speech recognition of video or audio file using whisper, stable-ts or faster-whisper and translation of subtitle u…☆10Feb 17, 2025Updated last year
- INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues☆61May 29, 2023Updated 3 years ago
- ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'☆477Oct 23, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [WACV 2026 Oral] LASER: Lip Landmark Assisted Speaker Detection for Robustness official implemntation☆28Feb 26, 2026Updated 3 months ago
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.☆64Jan 24, 2024Updated 2 years ago
- ☆22May 11, 2026Updated last month
- Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"☆115Nov 16, 2020Updated 5 years ago
- The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)☆118Mar 23, 2025Updated last year
- Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"☆21May 21, 2021Updated 5 years ago
- ☆15Feb 22, 2025Updated last year
- [TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation☆31Sep 6, 2024Updated last year
- Datasets of audio adversarial examples for deep speech recognition systems and Python code of a detection system☆14May 6, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆15Sep 17, 2022Updated 3 years ago
- [NeurIPS'22] Official Repository for Characterizing Datapoints via Second-Split Forgetting☆16Aug 11, 2023Updated 2 years ago
- Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)☆68Oct 29, 2023Updated 2 years ago
- Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering☆11Feb 16, 2023Updated 3 years ago
- A pipeline focused on the in-painting of text in images. For example the removal of subtitles in a screenshot of a movie.☆16Jun 30, 2022Updated 3 years ago
- Official code for the paper "GestSync: Determining who is speaking without a talking head" published at BMVC 2023☆48Sep 1, 2024Updated last year
- [NeurIPS'25 Spotlight] MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation☆21Feb 23, 2025Updated last year
- 学习OpenGL的代码仓库☆15Jun 5, 2026Updated 2 weeks ago
- We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enh…☆17Dec 31, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Stealth browser automation that actually works. Runs Camoufox (custom Firefox) in Docker with zero Chrome DevTools Protocol exposure, rea…☆52Jun 10, 2026Updated last week
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆30Apr 16, 2024Updated 2 years ago
- ☆34Jun 2, 2023Updated 3 years ago
- ☆18Mar 14, 2026Updated 3 months ago
- ☆84Mar 10, 2025Updated last year
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆21Nov 19, 2024Updated last year
- Source code of "Deep Rank Hashing Network for Cancellable Face Identification"☆12Jul 8, 2022Updated 3 years ago
- Code and dataset for NAACL 2022 paper "CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination" Hyounghun Kim, Abhay Zala, Mohi…☆16Nov 26, 2022Updated 3 years ago
- ☆21Nov 30, 2019Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Head detection model based on Ultralytics YoloV8☆17Sep 13, 2024Updated last year
- PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…☆22Apr 3, 2024Updated 2 years ago
- Kaggle Cats vs. Dogs Redux Edition☆21Mar 11, 2017Updated 9 years ago
- ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind (AAAI2025)☆20Apr 16, 2025Updated last year
- Automated Video Generation Solution☆24Jan 1, 2025Updated last year
- vits2 backbone with multilingual-bert, modified for Cantonese support☆26Apr 16, 2025Updated last year
- ☆19Sep 4, 2023Updated 2 years ago