[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆62Jan 24, 2024Updated 2 years ago
Alternatives and similar repositories for MSDWILD
Users that are interested in MSDWILD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Feb 19, 2026Updated last month
- INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues☆58May 29, 2023Updated 2 years ago
- ☆51Nov 24, 2022Updated 3 years ago
- Dynamic vision-guided speaker embedding for audio-visual speaker diarization☆12Jul 5, 2022Updated 3 years ago
- ☆21Nov 24, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆170Mar 23, 2025Updated last year
- Accepted by TMM 2022☆19Aug 18, 2022Updated 3 years ago
- ☆42Nov 22, 2024Updated last year
- ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'☆464Oct 23, 2023Updated 2 years ago
- ☆64Jun 28, 2023Updated 2 years ago
- Audio-visual diarization pipeline used for creating VoxConverse dataset☆21Jun 6, 2025Updated 10 months ago
- Visualization tools for audio-only and multi-modal speaker diarization dataset☆13Oct 27, 2023Updated 2 years ago
- Spot the conversation: speaker diarisation in the wild☆161Jul 26, 2022Updated 3 years ago
- A CSRankings-like index for speech researchers☆35Oct 16, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)☆22Jul 25, 2024Updated last year
- ☆20Mar 20, 2026Updated 3 weeks ago
- Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment☆14Feb 5, 2025Updated last year
- LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement☆16Jul 11, 2025Updated 9 months ago
- ☆13Oct 25, 2024Updated last year
- Diarization scoring tools.☆263Apr 8, 2026Updated last week
- ☆59Mar 28, 2025Updated last year
- ☆16Mar 7, 2019Updated 7 years ago
- neural network based speaker embedder☆25Jan 7, 2023Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- MagicData-RAMC Dataset and Baseline☆59Sep 13, 2022Updated 3 years ago
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆19Jul 16, 2024Updated last year
- Some comprehensive papers about speaker diarization☆345Mar 24, 2026Updated 3 weeks ago
- ☆54Oct 17, 2023Updated 2 years ago
- Development Toolkit for the VoxCeleb Speaker Recognition Challenge 2020☆43Jul 17, 2020Updated 5 years ago
- ☆24Sep 20, 2024Updated last year
- The code of the paper "Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation" (CVPR2023)☆18Mar 21, 2023Updated 3 years ago
- ☆70Feb 15, 2021Updated 5 years ago
- This repository contains the baseline system for CHiME-8 MMCSG challenge focusing on transcribing both sides of a conversation where one …☆41Mar 13, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- The source code of Tim-TSENet☆15Apr 22, 2022Updated 3 years ago
- Automatically setup the AISHELL-4 and MSDWild dataset for usage with pyannote-database (and pyannote-audio)☆15Oct 22, 2025Updated 5 months ago
- A pytorch implementation of the paper "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding"☆62Sep 19, 2024Updated last year
- Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.☆18Jul 11, 2022Updated 3 years ago
- Clustering-based methods for overlapping diarization☆83Jan 12, 2024Updated 2 years ago
- [ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach☆20Aug 2, 2021Updated 4 years ago
- End-to-End Neural Diarization☆430Aug 30, 2021Updated 4 years ago