X-LANCE / MSDWILD
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆42Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for MSDWILD
- ☆45Updated 2 years ago
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Updated last year
- ☆26Updated last year
- Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.☆17Updated 2 years ago
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆24Updated 2 months ago
- ☆14Updated this week
- ☆32Updated this week
- ☆59Updated 2 months ago
- Self-supervised Speaker Diarization Interspeech 2022 Implementation☆9Updated last month
- INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"☆105Updated 9 months ago
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆50Updated last week
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆23Updated 7 months ago
- Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…☆30Updated last year
- This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.☆103Updated last year
- Clustering-based methods for overlapping diarization☆70Updated 10 months ago
- ☆62Updated 10 months ago
- Code and data repository for paper "VoxCeleb enrichment for Age and Gender recognition" submitted at ASRU 2021☆64Updated 2 years ago
- ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'☆35Updated 2 years ago
- Dynamic vision-guided speaker embedding for audio-visual speaker diarization☆11Updated 2 years ago
- ☆49Updated 6 months ago
- Official Repository For VoxBlink2☆51Updated 3 months ago
- Official repository of NeXt-TDNN for speaker verification☆58Updated last month
- Learning differentiable temporal resolution on time-series data.☆33Updated 2 years ago
- wav2vec2 audio classification for prosodic boundary detection and other tasks☆36Updated last year
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆49Updated 7 months ago
- Update ASR paper everyday☆54Updated this week
- Attention Backend for Aotumatic Speaker Verification with Multiple Enrollment Utterances☆48Updated 2 years ago
- Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)☆16Updated 3 months ago
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆122Updated 5 months ago
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)☆51Updated 5 months ago