repo for active speaker detection for media videos.
☆31Nov 19, 2023Updated 2 years ago
Alternatives and similar repositories for movie-asd
Users that are interested in movie-asd are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆173Mar 23, 2025Updated last year
- INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues☆61May 29, 2023Updated 3 years ago
- ☆26Nov 17, 2025Updated 6 months ago
- ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'☆474Oct 23, 2023Updated 2 years ago
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.☆64Jan 24, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆22May 11, 2026Updated 2 weeks ago
- Official implementation of Transpotter, published in BMVC 2021☆16Aug 6, 2022Updated 3 years ago
- [ICASSP 2024] Official code for FreGrad☆35May 13, 2024Updated 2 years ago
- Accepted by TMM 2022☆19Aug 18, 2022Updated 3 years ago
- ☆15Feb 22, 2025Updated last year
- ☆15Feb 28, 2022Updated 4 years ago
- Datasets of audio adversarial examples for deep speech recognition systems and Python code of a detection system☆14May 6, 2023Updated 3 years ago
- [NeurIPS'22] Official Repository for Characterizing Datapoints via Second-Split Forgetting☆16Aug 11, 2023Updated 2 years ago
- Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)☆68Oct 29, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering☆11Feb 16, 2023Updated 3 years ago
- (ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation☆16Apr 29, 2025Updated last year
- A pipeline focused on the in-painting of text in images. For example the removal of subtitles in a screenshot of a movie.☆16Jun 30, 2022Updated 3 years ago
- Official code for the paper "GestSync: Determining who is speaking without a talking head" published at BMVC 2023☆48Sep 1, 2024Updated last year
- [NeurIPS'25 Spotlight] MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation☆21Feb 23, 2025Updated last year
- We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enh…☆17Dec 31, 2024Updated last year
- A Conversational Speech Generation Model☆14Mar 16, 2025Updated last year
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆30Apr 16, 2024Updated 2 years ago
- A python package of robust and effective defogging/dehazing method☆15Dec 30, 2018Updated 7 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆34Jun 2, 2023Updated 2 years ago
- [NeurIPS2023] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering☆13Jan 5, 2024Updated 2 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- ☆83Mar 10, 2025Updated last year
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆21Nov 19, 2024Updated last year
- ☆21Nov 30, 2019Updated 6 years ago
- PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…☆22Apr 3, 2024Updated 2 years ago
- Python (pip) package for fitting mixtures of Student's t-distributions using either maximum likelihood (EM) or Bayesian methodology (vari…☆11Sep 23, 2025Updated 8 months ago
- A curated list of Story Ending Generation models; DASFAA'22: Incorporating Commonsense Knowledge into Story Ending Generation via Heterog…☆14May 12, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Multimodal Variational Auto-encoder based Audio-Visual Segmentation [ICCV2023].☆20Sep 19, 2024Updated last year
- Rust standalone inference of Namo-500M series models. Extremly tiny, runing VLM on CPU.☆24Mar 12, 2025Updated last year
- ☆17Sep 27, 2020Updated 5 years ago
- ☆13Aug 14, 2022Updated 3 years ago
- ☆24Sep 20, 2024Updated last year
- MultiOCR, an interface that connects multiple open-source OCR and various Cloud OCR.☆32Aug 19, 2023Updated 2 years ago
- Code for ICCV2021: Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection☆28Oct 12, 2021Updated 4 years ago