LUMIA-Group/Leveraging-Self-Supervised-Learning-for-AVSR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LUMIA-Group/Leveraging-Self-Supervised-Learning-for-AVSR)

LUMIA-Group / Leveraging-Self-Supervised-Learning-for-AVSR

Official PyTorch implementation of paper Leveraging Unimodal Self Supervised Learning for Multimodal Audio-Visual Speech Recognition (ACL 2022)

☆67

Alternatives and similar repositories for Leveraging-Self-Supervised-Learning-for-AVSR

Users that are interested in Leveraging-Self-Supervised-Learning-for-AVSR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ms-dot-k / Multi-head-Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
☆27Mar 9, 2024Updated 2 years ago
mpc001 / Visual_Speech_Recognition_for_Multiple_Languages
View on GitHub
Visual Speech Recognition for Multiple Languages
☆478Aug 17, 2023Updated 2 years ago
smeetrs / deep_avsr
View on GitHub
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
☆244Feb 15, 2024Updated 2 years ago
facebookresearch / av_hubert
View on GitHub
A self-supervised learning framework for audio-visual speech
☆995Dec 7, 2023Updated 2 years ago
burchim / AVEC
View on GitHub
[WACV 2023] Audio-Visual Efficient Conformer (AVEC) for Robust Speech Recognition
☆101Feb 21, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mispchallenge / MISP2021-AVSR
View on GitHub
repository for paper "Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis"
☆18Jun 17, 2022Updated 4 years ago
VIPL-Audio-Visual-Speech-Understanding / learn-an-effective-lip-reading-model-without-pains
View on GitHub
The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…
☆168Sep 12, 2025Updated 10 months ago
mpc001 / end-to-end-lipreading
View on GitHub
Pytorch code for End-to-End Audiovisual Speech Recognition
☆183Nov 18, 2022Updated 3 years ago
ms-dot-k / Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
☆22Apr 11, 2022Updated 4 years ago
prajwalkr / vtp
View on GitHub
Official Implementation of Visual Transformer Pooling for Lip reading
☆41Aug 8, 2022Updated 3 years ago
yunyikristy / global_local
View on GitHub
☆14Oct 7, 2021Updated 4 years ago
ahaliassos / raven
View on GitHub
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆82Feb 27, 2025Updated last year
seungheondoh / hi_kia
View on GitHub
wake-up word emotion recognition [APSIPA 2022]
☆17Nov 11, 2022Updated 3 years ago
JackSyu / Discriminative-Multi-modality-Speech-Recognition
View on GitHub
TF code for our CVPR2020 paper "Discriminative Multi-modality Speech Recognition"
☆26Apr 27, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HLTCHKUST / CI-AVSR
View on GitHub
Code repository for the Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR) dataset.
☆42Jul 16, 2024Updated 2 years ago
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
Exgc / AVMuST-TED
View on GitHub
☆24Mar 30, 2024Updated 2 years ago
xjchenGit / awesome-audio-visual-deepfake
View on GitHub
awesome-audio-visual-robustness
☆11Jan 27, 2024Updated 2 years ago
RanaCM / DSU-AVO
View on GitHub
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
☆12May 13, 2024Updated 2 years ago
ms-dot-k / LRW_ID
View on GitHub
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Paddi…
☆10Oct 12, 2023Updated 2 years ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year
NeuroWave-ai / CUCVAE-TTS
View on GitHub
☆25Mar 12, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
jingyunx / Deformation-Flow-Based-Two-stream-Network-for-Lip-Reading
View on GitHub
☆15Dec 11, 2021Updated 4 years ago
JuanFMontesinos / Acappella-YNet
View on GitHub
Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21
☆18May 14, 2022Updated 4 years ago
yangjingyuan / ConstDecoder
View on GitHub
☆11Oct 24, 2022Updated 3 years ago
mpc001 / Lipreading_using_Temporal_Convolutional_Networks
View on GitHub
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…
☆438May 18, 2023Updated 3 years ago
WangHelin1997 / SpecAugment-plus
View on GitHub
A Pytorch implementation of the paper : SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification
☆34Jun 25, 2021Updated 5 years ago
AmeenAli / VideoMatch
View on GitHub
☆14Jan 5, 2022Updated 4 years ago
karthikbhamidipati / multi-task-speech-classification
View on GitHub
Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset
☆28Jul 17, 2026Updated last week
KAIST-AILab / SyncVSR
View on GitHub
[Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
☆61Mar 26, 2025Updated last year
YasserdahouML / visper
View on GitHub
ViSpeR: Multilingual Audio-Visual Speech Recognition
☆58Apr 17, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
shvdiwnkozbw / Self-supervised-Video-Concept
View on GitHub
Code for Static and Dynamic Concepts for Self-supervised Video Representation Learning.
☆11Jul 28, 2022Updated 3 years ago
zexupan / MuSE
View on GitHub
☆42Nov 22, 2024Updated last year
stoneMo / SLAVC
View on GitHub
Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)
☆22Dec 6, 2022Updated 3 years ago
uark-cviu / Right2Talk
View on GitHub
[ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach
☆20Aug 2, 2021Updated 4 years ago
IU-SAIGE / pse
View on GitHub
Efficient Personalized Speech Enhancement through Self-Supervised Learning
☆23Mar 12, 2023Updated 3 years ago
liuzhejun / XWbank_LipReading
View on GitHub
2019年“创青春·交子杯”新网银行高校金融科技挑战赛初赛、决赛思路代码分享
☆28Dec 11, 2019Updated 6 years ago
tstafylakis / Lipreading-ResNet
View on GitHub
Torch code for using Residual Networks with LSTMs for Lipreading
☆99Oct 8, 2018Updated 7 years ago