X-LANCE / MSDWILD
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆41Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for MSDWILD
- ☆45Updated last year
- ☆26Updated last year
- ☆59Updated last month
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆22Updated 6 months ago
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆47Updated 2 weeks ago
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Updated last year
- Dynamic vision-guided speaker embedding for audio-visual speaker diarization☆11Updated 2 years ago
- Clustering-based methods for overlapping diarization☆68Updated 9 months ago
- ☆49Updated 5 months ago
- Official Repository For VoxBlink2☆49Updated 2 months ago
- ☆69Updated last year
- Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.☆58Updated last year
- ☆50Updated 9 months ago
- Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)☆14Updated 3 months ago
- Self-supervised Speaker Diarization Interspeech 2022 Implementation☆9Updated last month
- wav2vec2 audio classification for prosodic boundary detection and other tasks☆34Updated last year
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆47Updated 6 months ago
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆122Updated 4 months ago
- This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.☆103Updated last year
- INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues☆46Updated last year
- ☆62Updated 9 months ago
- Official repository of NeXt-TDNN for speaker verification☆54Updated last month
- ☆32Updated 3 years ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆157Updated 3 months ago
- Attention Backend for Aotumatic Speaker Verification with Multiple Enrollment Utterances☆48Updated 2 years ago
- Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…☆29Updated last year
- ☆14Updated 2 years ago
- Official implementation for the paper: A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Unit…☆73Updated last year
- Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion (Interspeech 2022)☆111Updated 9 months ago
- Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.☆17Updated 2 years ago