TaoRuijie / TalkNet-ASDLinks

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

☆422

Alternatives and similar repositories for TalkNet-ASD

Users that are interested in TalkNet-ASD are comparing it to the libraries listed below

Sorting:

Junhua-Liao / Light-ASD
The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)
☆156Updated 7 months ago
mpc001 / Visual_Speech_Recognition_for_Multiple_Languages
Visual Speech Recognition for Multiple Languages
☆445Updated 2 years ago
facebookresearch / VisualVoice
Audio-Visual Speech Separation with Cross-Modal Consistency
☆237Updated 2 years ago
okankop / ASDNet
Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset
☆68Updated 3 years ago
smeetrs / deep_avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
☆238Updated last year
mpc001 / Lipreading_using_Temporal_Convolutional_Networks
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…
☆422Updated 2 years ago
CheyneyComputerScience / CREMA-D
Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D)
☆473Updated 7 months ago
joonson / syncnet_python
Out of time: automated lip sync in the wild
☆826Updated last year
facebookresearch / av_hubert
A self-supervised learning framework for audio-visual speech
☆946Updated last year
VIPL-Audio-Visual-Speech-Understanding / learn-an-effective-lip-reading-model-without-pains
The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…
☆162Updated last month
mpc001 / auto_avsr
Auto-AVSR: Lip-Reading Sentences Project
☆385Updated 9 months ago
prajwalkr / vtp
Official Implementation of Visual Transformer Pooling for Lip reading
☆40Updated 3 years ago
SJTUwxz / LoCoNet_ASD
code repo for LoCoNet: Long-Short Context Network for Active Speaker Detection
☆41Updated 2 years ago
afourast / deep_lip_reading
Code and models for evaluating a state-of-the-art lip reading network
☆197Updated 2 years ago
SRA2 / SPELL
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)
☆67Updated 2 years ago
joonson / syncnet_trainer
Disentangled Speech Embeddings using Cross-Modal Self-Supervision
☆163Updated 5 years ago
X-LANCE / MSDWILD
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆55Updated last year
SuperKogito / SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
☆381Updated last year
Chris10M / Lip2Speech
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
☆93Updated 3 months ago
HLTSingapore / Emotional-Speech-Data
This is the GitHub page for publicly available emotional speech data.
☆365Updated 3 years ago
thuhcsi / SECap
☆172Updated last year
fuankarion / active-speakers-context
Code for the Active Speakers in Context Paper (CVPR2020)
☆55Updated 4 years ago
XinhaoMei / WavCaps
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
☆248Updated last year
JeffC0628 / awesome-voice-conversion
A curated list of awesome voice conversion, projects and communities.
☆249Updated 9 months ago
joannahong / Lip2Wav-pytorch
a PyTorch implementation of Lip2Wav
☆51Updated 3 years ago
burchim / AVEC
[WACV 2023] Audio-Visual Efficient Conformer (AVEC) for Robust Speech Recognition
☆99Updated 2 years ago
hche11 / VGGSound
VGGSound: A Large-scale Audio-Visual Dataset
☆336Updated 4 years ago
ddlBoJack / emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training fo…
☆970Updated 10 months ago
liusongxiang / ppg-vc
PPG-Based Voice Conversion
☆348Updated 3 years ago
vskadandale / vocalist
Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
☆68Updated last year