sungnyun/cav2vec

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sungnyun/cav2vec)

sungnyun / cav2vec

(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation

☆16

Alternatives and similar repositories for cav2vec

Users that are interested in cav2vec are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sungnyun / avsr-temporal-dynamics
View on GitHub
(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
☆13Oct 22, 2024Updated last year
sungnyun / ARMHuBERT
View on GitHub
(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT
☆41Aug 29, 2024Updated last year
mispchallenge / MISP-ICME-AVSR
View on GitHub
☆17Jan 1, 2024Updated 2 years ago
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
david-gimeno / tailored-avsr
View on GitHub
Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"
☆15Feb 24, 2025Updated last year
ms-dot-k / AVSR
View on GitHub
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…
☆23Apr 3, 2024Updated 2 years ago
raymin0223 / self-contrastive-learning
View on GitHub
Self-Contrastive Learning: Single-viewed Supervised Contrastive Framework using Sub-network (AAAI 2023)
☆21Oct 28, 2023Updated 2 years ago
talhanai / wer-sigtest
View on GitHub
Script to perform statistical significance test between ASR hypotheses.
☆23Aug 13, 2017Updated 8 years ago
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
Bose / RAVEN
View on GitHub
☆20Oct 6, 2025Updated 9 months ago
sungnyun / diffblender
View on GitHub
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
☆46Dec 21, 2023Updated 2 years ago
etri / kmsav
View on GitHub
☆14Oct 25, 2024Updated last year
raymin0223 / patch-mix_contrastive_learning
View on GitHub
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)
☆76Mar 11, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
choijeongsoo / av2av
View on GitHub
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
☆48Sep 6, 2024Updated last year
01Zhangbw / Speech-and-audio-papers-Top-Conference
View on GitHub
☆141Jan 24, 2026Updated 6 months ago
Sindhu-Hegde / multivsr
View on GitHub
Official code for the paper "Scaling Multilingual Visual Speech Recognition"
☆20Aug 15, 2025Updated 11 months ago
Emrys365 / torch_stft
View on GitHub
PyTorch-based implementations of short-time Fourier transform
☆14Jul 21, 2025Updated last year
umbertocappellazzo / Omni-AVSR
View on GitHub
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…
☆38Mar 10, 2026Updated 4 months ago
chaufanglin / Normal2Whisper
View on GitHub
Implementation of "Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation"
☆14Oct 31, 2024Updated last year
JeongHun0716 / MMS-LLaMA
View on GitHub
Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…
☆48Jun 12, 2025Updated last year
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
MoonJuhan / tistory-readme-stats
View on GitHub
Tistory Readme Stat Card
☆11Mar 27, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
YUCHEN005 / UniVPM
View on GitHub
Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
☆28Jun 21, 2023Updated 3 years ago
roudimit / whisper-flamingo
View on GitHub
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
☆210Jul 29, 2025Updated 11 months ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
AMLAB-Wakayama / gammachirp-filterbank
View on GitHub
An original package of the dynamic compressive gammachirp filterbank (dcGC-FB)
☆14Jul 7, 2026Updated 2 weeks ago
sungnyun / understanding-cdfsl
View on GitHub
(NeurIPS 2022) Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty
☆34Mar 19, 2024Updated 2 years ago
NAVER-INTEL-Co-Lab / gaudi-lavcap
View on GitHub
☆15Jan 24, 2025Updated last year
WikiChao / VisAH
View on GitHub
[CVPR 2025] Pytorch implementation of the paper "Learning to Highlight Audio by Watching Movies"
☆15Oct 1, 2025Updated 9 months ago
euiin / SMART
View on GitHub
SMART introduces a novel test-time framework where Small Language Models (SLMs) reason step-by-step, and Large Language Models (LLMs) pro…
☆12Jul 9, 2025Updated last year
HumanMLLM / CoGenAV
View on GitHub
☆64Jul 1, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
iclr2024mcmi / ICLRMCMI
View on GitHub
Official implementation of Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information
☆12Sep 28, 2023Updated 2 years ago
KevRiver / MSB
View on GitHub
Mad Square's Brawl is the 2D Android Platformer PVP game.
☆17Feb 15, 2023Updated 3 years ago
zds-potato / multilingual-phonetic-sv
View on GitHub
☆10Dec 22, 2023Updated 2 years ago
yl4467 / singer
View on GitHub
☆15Feb 22, 2025Updated last year
mmmmayi / ExPO
View on GitHub
official implementation of paper ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
☆14Mar 14, 2025Updated last year
joymallyac / Fair-SMOTE
View on GitHub
GitHub repo for FSE 2021 Paper - ``Bias in Machine Learning Software: Why? How? What to do?''
☆17May 7, 2022Updated 4 years ago
jmandel / fun-with-formants
View on GitHub
Speech formant tracking code in Python
☆15Oct 10, 2013Updated 12 years ago