kaistmm/VoxMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kaistmm/VoxMM)

kaistmm / VoxMM

☆23

Alternatives and similar repositories for VoxMM

Users that are interested in VoxMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

art-jang / LiTFiC
View on GitHub
[CVPR2025] Official code for Lost in Translation Found in Context
☆24Jan 14, 2026Updated 6 months ago
sungnyun / cav2vec
View on GitHub
(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
☆16Apr 29, 2025Updated last year
kaistmm / fregrad
View on GitHub
[ICASSP 2024] Official code for FreGrad
☆35May 13, 2024Updated 2 years ago
Sindhu-Hegde / multivsr
View on GitHub
Official code for the paper "Scaling Multilingual Visual Speech Recognition"
☆20Aug 15, 2025Updated 11 months ago
ahaliassos / raven
View on GitHub
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆82Feb 27, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kaistmm / Metric-UD-KWS
View on GitHub
Official code for Metric learning for user-defined keyword spotting
☆40Feb 21, 2024Updated 2 years ago
kaistmm / VoiceDiT
View on GitHub
[ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
☆52Apr 9, 2025Updated last year
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
cyhuang-tw / robust-vc
View on GitHub
☆11May 7, 2022Updated 4 years ago
sungnyun / avsr-temporal-dynamics
View on GitHub
(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
☆13Oct 22, 2024Updated last year
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
zcxu-eric / AVA-AVD
View on GitHub
☆51Nov 24, 2022Updated 3 years ago
zds-potato / multilingual-phonetic-sv
View on GitHub
☆10Dec 22, 2023Updated 2 years ago
frankenliu / LOAE
View on GitHub
☆10Sep 25, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Mashiro009 / slidespeech_dl
View on GitHub
☆24Sep 20, 2024Updated last year
kaistmm / AdaptVC
View on GitHub
☆17Jun 2, 2025Updated last year
MontaEllis / Framework-of-GAN-Inversion
View on GitHub
A Simplied Framework of GAN Inversion
☆16Mar 18, 2024Updated 2 years ago
roudimit / whisper-flamingo
View on GitHub
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
☆210Jul 29, 2025Updated 11 months ago
kaistmm / AVCD
View on GitHub
[NeurIPS 2025] AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
☆27Nov 3, 2025Updated 8 months ago
MaikeZuefle / f-actor
View on GitHub
☆28Jul 17, 2026Updated last week
deepbrainai-research / koeba
View on GitHub
☆34Mar 17, 2023Updated 3 years ago
kaistmm / SlowFastSign
View on GitHub
[ICASSP 2024] Official code for Slowfast Network for Continuous Sign Language Recognition
☆66Jul 4, 2025Updated last year
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
choijeongsoo / utut
View on GitHub
[TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
☆31Sep 6, 2024Updated last year
khfs / DuplexMamba
View on GitHub
☆18Mar 6, 2026Updated 4 months ago
jh-cha-prml / JELLY
View on GitHub
Code for the paper "JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis"
☆14Nov 5, 2024Updated last year
yiren-jian / BLIText
View on GitHub
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
☆26Dec 5, 2023Updated 2 years ago
JaesungHuh / VoxSRC2022
View on GitHub
VoxSRC2022 workshop development kit
☆19Jul 21, 2022Updated 4 years ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
JaesungHuh / VoxMovies
View on GitHub
Evaluation script for VoxMovies dataset in PyTorch
☆23Jan 12, 2024Updated 2 years ago
umbertocappellazzo / Omni-AVSR
View on GitHub
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…
☆38Mar 10, 2026Updated 4 months ago
NAVER-INTEL-Co-Lab / gaudi-lavcap
View on GitHub
☆15Jan 24, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
JongSuk1 / AVCap
View on GitHub
☆11Sep 1, 2024Updated last year
DanielLin94144 / Full-Duplex-Bench
View on GitHub
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
☆240May 20, 2026Updated 2 months ago
linusericsson / ssl-invariances
View on GitHub
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".
☆16Dec 7, 2021Updated 4 years ago
glory20h / VoiceLDM
View on GitHub
VoiceLDM: Text-to-Speech with Environmental Context
☆194Aug 9, 2024Updated last year
atosystem / SSL_Interface
View on GitHub
Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024
☆16Nov 19, 2024Updated last year
JongSuk1 / EquiAV
View on GitHub
☆36Jan 20, 2025Updated last year
kaistmm / voxsim_trainer
View on GitHub
[INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset
☆24Sep 29, 2025Updated 9 months ago