JaesungHuh/av-diarization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JaesungHuh/av-diarization)

JaesungHuh / av-diarization

Audio-visual diarization pipeline used for creating VoxConverse dataset

☆22

Alternatives and similar repositories for av-diarization

Users that are interested in av-diarization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xiaoxiaomiao323 / MSA
View on GitHub
☆16Feb 19, 2026Updated 5 months ago
JaesungHuh / VoxMovies
View on GitHub
Evaluation script for VoxMovies dataset in PyTorch
☆23Jan 12, 2024Updated 2 years ago
exercise-book-yq / FreeCodec
View on GitHub
FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS
☆24Sep 9, 2024Updated last year
intflow / KICT_GC2020_eval500
View on GitHub
Public dataset developed by KICT_INTFLOW for IITP AI GrandChallenge 2019, Track-3
☆13Mar 4, 2020Updated 6 years ago
BUTSpeechFIT / DVBx
View on GitHub
Discriminative Training of VBx Diarization
☆28Sep 23, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
X-LANCE / MSDWILD
View on GitHub
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆65Jan 24, 2024Updated 2 years ago
jiwonix / Sound-Event-Detection-papers
View on GitHub
Sound Event Detection (SED) paper collection
☆15Jun 26, 2024Updated 2 years ago
fgnt / speaker_reassignment
View on GitHub
Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment
☆14Feb 5, 2025Updated last year
TaoRuijie / MFV-KSD
View on GitHub
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)
☆22Jul 25, 2024Updated last year
etri / kmsav
View on GitHub
☆14Oct 25, 2024Updated last year
Maokui-He / NSD-MA-MSE
View on GitHub
A pytorch implementation of the paper "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding"
☆62Sep 19, 2024Updated last year
cjchun313 / intflowkict_2020_AI_Grand_Challenge
View on GitHub
2020 AI Grand Challenge (3rd track) - public sample
☆16Jan 20, 2021Updated 5 years ago
JeongHun0716 / vsr-low
View on GitHub
Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)
☆17Mar 17, 2025Updated last year
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
kaistmm / voxceleb-disentangler
View on GitHub
[INTERSPEECH 2024] Official pytorch code for the paper "Disentangled Representation Learning for Environment-agnostic Speaker Recognition…
☆18Jul 23, 2024Updated 2 years ago
jagabandhumishra / W2V-E2E-Language-Diarization
View on GitHub
☆11Sep 4, 2023Updated 2 years ago
VoxBlink2 / ScriptsForVoxBlink2
View on GitHub
Official Repository For VoxBlink2
☆88Aug 13, 2024Updated last year
llm-lab-org / CLASP
View on GitHub
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
☆13Jun 27, 2025Updated last year
Sindhu-Hegde / multivsr
View on GitHub
Official code for the paper "Scaling Multilingual Visual Speech Recognition"
☆20Aug 15, 2025Updated 11 months ago
zaocan666 / DyViSE
View on GitHub
Dynamic vision-guided speaker embedding for audio-visual speaker diarization
☆12Jul 5, 2022Updated 4 years ago
hohsiangwu / rethinking-visual-sound-localization
View on GitHub
Official implementation of the paper How to Listen? Rethinking Visual Sound Localization
☆18Apr 25, 2022Updated 4 years ago
Bartelds / ctc-dro
View on GitHub
Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.
☆17May 16, 2025Updated last year
Saurabhbhati / DASS
View on GitHub
☆12Apr 26, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
BUTSpeechFIT / diacorrect
View on GitHub
Error correction back-end for speaker diarization
☆18Sep 26, 2023Updated 2 years ago
lwang114 / GraphUnsupASR
View on GitHub
☆10Apr 17, 2024Updated 2 years ago
Mu-Y / DiariST
View on GitHub
☆18Sep 19, 2023Updated 2 years ago
pilot7747 / VoxDIY
View on GitHub
This repository provides data and code for "Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription" paper.
☆16Jul 22, 2021Updated 5 years ago
rishikksh20 / NU-Wave-pytorch
View on GitHub
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
☆37May 25, 2021Updated 5 years ago
jhcodec843 / jhcodec
View on GitHub
☆48Mar 17, 2026Updated 4 months ago
chimechallenge / chime-utils
View on GitHub
Scripts for data generation, scoring and data manifest preparation for CHiME-8 DASR task.
☆26Feb 25, 2025Updated last year
joonaskalda / PixIT
View on GitHub
Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…
☆105Jan 10, 2025Updated last year
BUTSpeechFIT / DiaPer
View on GitHub
☆69Feb 8, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ina-foss / InaGVAD
View on GitHub
Voice activity detection and speaker gender segmentation audiovisual corpus
☆16Jan 20, 2025Updated last year
linjac / GenDARA
View on GitHub
☆13Jan 14, 2025Updated last year
Audio-WestlakeU / FS-EEND
View on GitHub
The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based …
☆183May 7, 2026Updated 2 months ago
JaesungHuh / ca-subtitle
View on GitHub
Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"
☆21Nov 3, 2025Updated 8 months ago
wondervictor / RetinaNet-Text-Detection
View on GitHub
Text Detection by RetinaNet with PyTorch (Code will be released soon)
☆10Dec 1, 2018Updated 7 years ago
Alexander-H-Liu / dinosr
View on GitHub
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
☆53Jan 18, 2024Updated 2 years ago
zjzser / WMCodec
View on GitHub
PyTorch Implementation of [WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification](https://arxiv.or…
☆18Jul 31, 2025Updated 11 months ago