TaoRuijie / TalkNet-ASD
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
☆377Updated last year
Alternatives and similar repositories for TalkNet-ASD
Users that are interested in TalkNet-ASD are comparing it to the libraries listed below
Sorting:
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆135Updated last month
- Out of time: automated lip sync in the wild☆762Updated last year
- Audio-Visual Speech Separation with Cross-Modal Consistency☆230Updated last year
- Visual Speech Recognition for Multiple Languages☆405Updated last year
- Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset☆61Updated 3 years ago
- A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.☆227Updated last year
- This is the GitHub page for publicly available emotional speech data.☆347Updated 3 years ago
- ☆419Updated last year
- Disentangled Speech Embeddings using Cross-Modal Self-Supervision☆159Updated 5 years ago
- The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the …☆159Updated 2 years ago
- Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D)☆428Updated last month
- Deep-Learning-Based Audio-Visual Speech Enhancement and Separation☆208Updated 2 years ago
- Official Implementation of Visual Transformer Pooling for Lip reading☆40Updated 2 years ago
- A collection of datasets for the purpose of emotion recognition/detection in speech.☆336Updated 7 months ago
- Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!☆351Updated 3 years ago
- Auto-AVSR: Lip-Reading Sentences Project☆338Updated 4 months ago
- ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…☆412Updated last year
- [CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg☆248Updated last month
- PPG-Based Voice Conversion☆336Updated 2 years ago
- Code for the Active Speakers in Context Paper (CVPR2020)☆54Updated 3 years ago
- Official code for the paper "Visual Speech Enhancement Without A Real Visual Stream" published at WACV 2021☆107Updated 11 months ago
- Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices☆66Updated last year
- MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]☆261Updated 10 months ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆155Updated last week
- ☆163Updated 10 months ago
- A curated list of awesome voice conversion, projects and communities.☆232Updated 4 months ago
- In defence of metric learning for speaker recognition☆1,099Updated last year
- Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)☆682Updated last year
- A self-supervised learning framework for audio-visual speech☆902Updated last year
- PyTorch Implementation for Paper "Emotionally Enhanced Talking Face Generation" (ICCVW'23 and ACM-MMW'23)☆365Updated 4 months ago