sungnyun/avsr-temporal-dynamics

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sungnyun/avsr-temporal-dynamics)

sungnyun / avsr-temporal-dynamics

(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

☆13

Alternatives and similar repositories for avsr-temporal-dynamics

Users that are interested in avsr-temporal-dynamics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sungnyun / cav2vec
View on GitHub
(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
☆16Apr 29, 2025Updated last year
david-gimeno / tailored-avsr
View on GitHub
Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"
☆15Feb 24, 2025Updated last year
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago
sungnyun / ARMHuBERT
View on GitHub
(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT
☆41Aug 29, 2024Updated last year
mispchallenge / MISP-ICME-AVSR
View on GitHub
☆17Jan 1, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ms-dot-k / AVSR
View on GitHub
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…
☆23Apr 3, 2024Updated 2 years ago
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
chaufanglin / Normal2Whisper
View on GitHub
Implementation of "Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation"
☆14Oct 31, 2024Updated last year
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
Sindhu-Hegde / multivsr
View on GitHub
Official code for the paper "Scaling Multilingual Visual Speech Recognition"
☆20Aug 15, 2025Updated 11 months ago
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
raymin0223 / self-contrastive-learning
View on GitHub
Self-Contrastive Learning: Single-viewed Supervised Contrastive Framework using Sub-network (AAAI 2023)
☆21Oct 28, 2023Updated 2 years ago
raymin0223 / patch-mix_contrastive_learning
View on GitHub
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)
☆76Mar 11, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
umbertocappellazzo / PETL_AST
View on GitHub
This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" [IEEE MLSP 2024] …
☆41Jul 31, 2024Updated last year
kaistmm / VoxMM
View on GitHub
☆23May 11, 2026Updated 2 months ago
sungnyun / openssl-simcore
View on GitHub
(CVPR 2023) Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning
☆30Oct 3, 2023Updated 2 years ago
sungnyun / diffblender
View on GitHub
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
☆46Dec 21, 2023Updated 2 years ago
HumanMLLM / CoGenAV
View on GitHub
☆64Jul 1, 2025Updated last year
huckiyang / Interspeech23-Tutorial-Para-Efficient-Cross-Modal-Tutorial
View on GitHub
Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling
☆15Oct 9, 2023Updated 2 years ago
choijeongsoo / av2av
View on GitHub
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
☆48Sep 6, 2024Updated last year
roudimit / whisper-flamingo
View on GitHub
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
☆210Jul 29, 2025Updated last year
operrotin / GFM-IAIF
View on GitHub
Glottal Flow Model-based Iterative Adaptive Inverse Filtering
☆28Sep 28, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
YUCHEN005 / UniVPM
View on GitHub
Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
☆28Jun 21, 2023Updated 3 years ago
glory20h / FitHuBERT
View on GitHub
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning (INTERSPEECH 2022)
☆19Nov 15, 2023Updated 2 years ago
nguyenvulebinh / AV-HuBERT-S2S
View on GitHub
Huggingface Implementation of AV-HuBERT on the MuAViC Dataset
☆19Mar 6, 2025Updated last year
ankitapasad / layerwise-analysis
View on GitHub
Layer-wise analysis of self-supervised pre-trained speech representations
☆135Oct 18, 2024Updated last year
zeta-chicken / toWhisper
View on GitHub
☆29Jul 12, 2024Updated 2 years ago
ServiceNow / data-augmentation-with-llms
View on GitHub
Data Augmentation for Intent Classification with Off-the-Shelf Large Language Models is a ServiceNow Research project
☆30Jun 12, 2023Updated 3 years ago
sungnyun / understanding-cdfsl
View on GitHub
(NeurIPS 2022) Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty
☆34Mar 19, 2024Updated 2 years ago
linusericsson / ssl-invariances
View on GitHub
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".
☆16Dec 7, 2021Updated 4 years ago
JeongHun0716 / e-mvsr
View on GitHub
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)
☆20Mar 17, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Tiago-Roxo / WASD
View on GitHub
☆20Updated this week
JeongHun0716 / MMS-LLaMA
View on GitHub
Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…
☆48Jun 12, 2025Updated last year
ahaliassos / raven
View on GitHub
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆82Feb 27, 2025Updated last year
Bose / RAVEN
View on GitHub
☆20Oct 6, 2025Updated 9 months ago
ifnspaml / Perceptual-Weighting-Filter-Loss
View on GitHub
A perceptual weighting filter loss for DNN training in speech enhancement
☆24Apr 30, 2022Updated 4 years ago
uark-cviu / Right2Talk
View on GitHub
[ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach
☆20Aug 2, 2021Updated 4 years ago
jmandel / fun-with-formants
View on GitHub
Speech formant tracking code in Python
☆15Oct 10, 2013Updated 12 years ago