facebookresearch / VisualVoice
Audio-Visual Speech Separation with Cross-Modal Consistency
☆228Updated last year
Alternatives and similar repositories for VisualVoice:
Users that are interested in VisualVoice are comparing it to the libraries listed below
- Deep-Learning-Based Audio-Visual Speech Enhancement and Separation☆205Updated last year
- Disentangled Speech Embeddings using Cross-Modal Self-Supervision☆159Updated 4 years ago
- UniSpeech - Large Scale Self-Supervised Learning for Speech☆453Updated 11 months ago
- An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-S…☆404Updated last year
- PPG-Based Voice Conversion☆334Updated 2 years ago
- A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.☆223Updated last year
- Speaker embedding (d-vector) trained with GE2E loss☆278Updated last year
- VGGSound: A Large-scale Audio-Visual Dataset☆309Updated 3 years ago
- Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!☆347Updated 2 years ago
- An open source dataset for source separation☆410Updated last year
- ☆147Updated 2 years ago
- ICASSP 2022: 'Self-supervised Speaker Recognition with Loss-gated Learning'☆89Updated last year
- Official repository for RawNet, RawNet2, and RawNet3☆371Updated last year
- This is the GitHub page for publicly available emotional speech data.☆345Updated 3 years ago
- Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196☆313Updated 4 years ago
- Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"☆357Updated 8 months ago
- Research code for the paper "Fine-tuning wav2vec2 for speaker recognition" found at https://arxiv.org/abs/2109.15053☆144Updated 2 years ago
- A curated list of awesome voice conversion, projects and communities.☆226Updated 2 months ago
- Diarization scoring tools.☆240Updated 2 years ago
- Code for SuDoRm-Rf networks for efficient audio source separation. SuDoRm-Rf stands for SUccessive DOwnsampling and Resampling of Multi-R…☆313Updated last year
- Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments☆107Updated last year
- Include some core functions and model to handle speech separation☆155Updated 3 years ago
- This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.☆575Updated last year
- [InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei …☆208Updated 2 years ago
- target speaker extraction and verification for multi-talker speech☆175Updated 4 years ago
- A PyTorch implementation of End-to-End Neural Diarization☆104Updated last year
- A library for speech data augmentation in time-domain☆656Updated 3 years ago
- A PyTorch implementation of "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" (see recipes in aps framework https:/…☆209Updated last year
- a simplified version of wav2vec(1.0, vq, 2.0) in fairseq☆147Updated 4 years ago
- Code for the Active Speakers in Context Paper (CVPR2020)☆54Updated 3 years ago