facebookresearch / VisualVoice
Audio-Visual Speech Separation with Cross-Modal Consistency
☆227Updated last year
Alternatives and similar repositories for VisualVoice:
Users that are interested in VisualVoice are comparing it to the libraries listed below
- Deep-Learning-Based Audio-Visual Speech Enhancement and Separation☆205Updated last year
- A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.☆219Updated last year
- Disentangled Speech Embeddings using Cross-Modal Self-Supervision☆156Updated 4 years ago
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆116Updated 10 months ago
- Include some core functions and model to handle speech separation☆155Updated 3 years ago
- VGGSound: A Large-scale Audio-Visual Dataset☆303Updated 3 years ago
- An open source dataset for source separation☆405Updated last year
- Audio-Visual Speech Recognition using Sequence to Sequence Models☆82Updated 4 years ago
- Code for the Active Speakers in Context Paper (CVPR2020)☆54Updated 3 years ago
- An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-S…☆400Updated last year
- PPG-Based Voice Conversion☆332Updated 2 years ago
- The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".☆252Updated 9 months ago
- Official Implementation of Visual Transformer Pooling for Lip reading☆39Updated 2 years ago
- Pytorch code for End-to-End Audiovisual Speech Recognition☆174Updated 2 years ago
- Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments☆107Updated 11 months ago
- Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset☆59Updated 3 years ago
- Official code for the paper "Visual Speech Enhancement Without A Real Visual Stream" published at WACV 2021☆104Updated 8 months ago
- ☆32Updated 3 months ago
- UniSpeech - Large Scale Self-Supervised Learning for Speech☆449Updated 10 months ago
- The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…☆157Updated last year
- This is the GitHub page for publicly available emotional speech data.☆336Updated 3 years ago
- The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmente…☆108Updated last year
- ☆145Updated 2 years ago
- Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"☆111Updated 4 years ago
- Code for SuDoRm-Rf networks for efficient audio source separation. SuDoRm-Rf stands for SUccessive DOwnsampling and Resampling of Multi-R…☆314Updated last year
- Official repository for RawNet, RawNet2, and RawNet3☆369Updated 11 months ago
- Executable code based on Google articles☆165Updated 2 years ago
- Speaker embedding (d-vector) trained with GE2E loss☆276Updated last year
- transform-average-concatenate (TAC) method for end-to-end microphone permutation and number invariant ad-hoc beamforming.☆265Updated 3 years ago
- Research code for the paper "Fine-tuning wav2vec2 for speaker recognition" found at https://arxiv.org/abs/2109.15053☆144Updated 2 years ago