rash1993/movie-asd

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rash1993/movie-asd)

rash1993 / movie-asd

repo for active speaker detection for media videos.

☆31

Alternatives and similar repositories for movie-asd

Users that are interested in movie-asd are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Rudrabha / 8X-Super-Resolution
View on GitHub
This repository is a repository for the paper, "Irgun: Improved residue based gradual up-scaling network for single image super resolutio…
☆16Aug 26, 2020Updated 5 years ago
JaesungHuh / VoxMovies
View on GitHub
Evaluation script for VoxMovies dataset in PyTorch
☆23Jan 12, 2024Updated 2 years ago
kaistmm / TalkNCE
View on GitHub
Official implementation of TalkNCE (ICASSP 2024).
☆18Apr 30, 2025Updated last year
Jiang-Yidi / TS-TalkNet
View on GitHub
INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues
☆61May 29, 2023Updated 3 years ago
plnguyen2908 / UniTalk-ASD-code
View on GitHub
[Interspeech 2026] Revisiting Active Speaker Detection: An In-the-Wild Benchmark for Generalization and Robustness
☆22Jun 25, 2026Updated last month
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
okankop / ASDNet
View on GitHub
Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset
☆73Jan 18, 2022Updated 4 years ago
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
TaoRuijie / TalkNet-ASD
View on GitHub
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
☆489Oct 23, 2023Updated 2 years ago
prajwalkr / transpotter
View on GitHub
Official implementation of Transpotter, published in BMVC 2021
☆16Aug 6, 2022Updated 3 years ago
kaistmm / VoxMM
View on GitHub
☆23May 11, 2026Updated 2 months ago
kaistmm / fregrad
View on GitHub
[ICASSP 2024] Official code for FreGrad
☆35May 13, 2024Updated 2 years ago
afourast / avobjects
View on GitHub
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
☆114Nov 16, 2020Updated 5 years ago
Sid2697 / Word-recognition-EmbedNet-CAB
View on GitHub
Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"
☆21May 21, 2021Updated 5 years ago
Overcautious / ADENet
View on GitHub
Accepted by TMM 2022
☆19Aug 18, 2022Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
X-LANCE / MSDWILD
View on GitHub
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆66Jan 24, 2024Updated 2 years ago
SRA2 / SPELL
View on GitHub
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)
☆67Oct 29, 2023Updated 2 years ago
choijeongsoo / utut
View on GitHub
[TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
☆31Sep 6, 2024Updated last year
double125 / Graph-Matching-Attention
View on GitHub
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
☆11Feb 16, 2023Updated 3 years ago
pratyushmaini / ssft
View on GitHub
[NeurIPS'22] Official Repository for Characterizing Datapoints via Second-Split Forgetting
☆16Aug 11, 2023Updated 2 years ago
Blank-Wang / DCASE2018-Task4
View on GitHub
Weakly Supervised CRNN System for Sound Event Detection With Large-scale Unlabeled In-domain Data
☆11Oct 31, 2018Updated 7 years ago
Junhua-Liao / LR-ASD
View on GitHub
The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)
☆133Mar 23, 2025Updated last year
Sindhu-Hegde / gestsync
View on GitHub
Official code for the paper "GestSync: Determining who is speaking without a talking head" published at BMVC 2023
☆48Sep 1, 2024Updated last year
nytopop / csm
View on GitHub
A Conversational Speech Generation Model
☆14Mar 16, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LaVi-Lab / Visual-Table
View on GitHub
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Oct 17, 2024Updated last year
CarolineGao / LoRA-Dataset
View on GitHub
[NeurIPS2023] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering
☆12Jan 5, 2024Updated 2 years ago
hyc2026 / StoryTeller
View on GitHub
☆84Mar 10, 2025Updated last year
jingchenchen / ReasoningConsistency-VQA
View on GitHub
☆13Aug 14, 2022Updated 3 years ago
mk-minchul / sapiensid
View on GitHub
☆27Nov 17, 2025Updated 8 months ago
elarsonSU / egret
View on GitHub
Evil generation of regular expression test string
☆21Nov 22, 2019Updated 6 years ago
yevvonlim / kai-presentation
View on GitHub
Claude Code skill for KAI presentation design in HTML
☆16Mar 20, 2026Updated 4 months ago
hyounghk / CoSIm
View on GitHub
Code and dataset for NAACL 2022 paper "CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination" Hyounghun Kim, Abhay Zala, Mohi…
☆16Nov 26, 2022Updated 3 years ago
jasonwu0731 / GettingToKnowYou
View on GitHub
☆21Nov 30, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
xianzhangzx / FINER-MLLM
View on GitHub
The implementation of FINER-MLLM, which is accepted by MM2024.
☆18Oct 8, 2024Updated last year
prajwalkr / dogsVScats
View on GitHub
Kaggle Cats vs. Dogs Redux Edition
☆21Mar 11, 2017Updated 9 years ago
OpenNLPLab / MMVAE-AVS
View on GitHub
Multimodal Variational Auto-encoder based Audio-Visual Segmentation [ICCV2023].
☆20Sep 19, 2024Updated last year
yinruiqing / change_detection
View on GitHub
Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
☆67Jul 14, 2020Updated 6 years ago
mpc001 / Visual_Speech_Recognition_for_Multiple_Languages
View on GitHub
Visual Speech Recognition for Multiple Languages
☆479Aug 17, 2023Updated 2 years ago
sungnyun / cav2vec
View on GitHub
(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
☆16Apr 29, 2025Updated last year
lisha-chen / Deep-structured-facial-landmark-detection
View on GitHub
☆20Oct 9, 2020Updated 5 years ago