TaoRuijie / TalkNet-ASD
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
☆346Updated last year
Alternatives and similar repositories for TalkNet-ASD:
Users that are interested in TalkNet-ASD are comparing it to the libraries listed below
- Audio-Visual Speech Separation with Cross-Modal Consistency☆227Updated last year
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆117Updated 10 months ago
- A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.☆219Updated last year
- Visual Speech Recognition for Multiple Languages☆385Updated last year
- Out of time: automated lip sync in the wild☆723Updated last year
- Deep-Learning-Based Audio-Visual Speech Enhancement and Separation☆205Updated last year
- The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…☆157Updated last year
- ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…☆408Updated last year
- Code for the Active Speakers in Context Paper (CVPR2020)☆54Updated 3 years ago
- A curated list of awesome voice conversion, projects and communities.☆221Updated last month
- In defence of metric learning for speaker recognition☆1,082Updated 10 months ago
- A collection of datasets for the purpose of emotion recognition/detection in speech.☆312Updated 4 months ago
- This is the GitHub page for publicly available emotional speech data.☆336Updated 3 years ago
- Official Implementation of Visual Transformer Pooling for Lip reading☆40Updated 2 years ago
- PPG-Based Voice Conversion☆332Updated 2 years ago
- INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. …☆656Updated last month
- Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!☆344Updated 2 years ago
- A self-supervised learning framework for audio-visual speech☆875Updated last year
- UniSpeech - Large Scale Self-Supervised Learning for Speech☆449Updated 10 months ago
- Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit☆832Updated last month
- ☆154Updated 7 months ago
- Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation☆576Updated 3 weeks ago
- End-to-End Neural Diarization☆395Updated 3 years ago
- Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)☆644Updated 10 months ago
- Disentangled Speech Embeddings using Cross-Modal Self-Supervision☆156Updated 4 years ago
- Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset☆59Updated 3 years ago
- [ICASSP 2023] Official Tensorflow implementation of "Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech E…☆168Updated 9 months ago
- Speaker embedding (d-vector) trained with GE2E loss☆276Updated last year
- A must-read paper for speech separation based on neural networks☆770Updated 2 years ago
- Auto-AVSR: Lip-Reading Sentences Project☆311Updated last month