JaesungHuh / av-diarizationLinks
Audio-visual diarization pipeline used for creating VoxConverse dataset
☆21Updated 7 months ago
Alternatives and similar repositories for av-diarization
Users that are interested in av-diarization are comparing it to the libraries listed below
Sorting:
- ☆35Updated this week
- Implementation of the paper "BERTphone: Phonetically-aware Encoder Representations for Utterance-level Speaker and Language Recognition"☆17Updated 5 years ago
- Official Implementation of EnCLAP (ICASSP 2024)☆94Updated last year
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆40Updated last year
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.☆58Updated last year
- This repository contains the baseline system for CHiME-8 MMCSG challenge focusing on transcribing both sides of a conversation where one …☆39Updated last year
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Updated 11 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆92Updated 2 years ago
- ☆28Updated 4 years ago
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"☆50Updated 9 months ago
- Transcribing Speech with Multinomial Diffusion, training code and models.☆81Updated 2 years ago
- ☆16Updated 2 years ago
- Collection of scripts from mHuBERT-147.☆32Updated last year
- Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…☆52Updated last month
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆51Updated last year
- A pakage for crawling audio from Youtube☆42Updated 2 years ago
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Updated 2 years ago
- VoxSRC2022 workshop development kit☆19Updated 3 years ago
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"☆31Updated 2 years ago
- Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clusterin…☆63Updated 2 years ago
- The VoxTube dataset official repository☆71Updated last year
- Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer☆38Updated 10 months ago
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆63Updated 2 years ago
- Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.☆89Updated 2 years ago
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆24Updated 3 months ago
- Dynamic vision-guided speaker embedding for audio-visual speaker diarization