smeetrs/deep_avsr

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/smeetrs/deep_avsr)

smeetrs / deep_avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

☆244

Alternatives and similar repositories for deep_avsr

Users that are interested in deep_avsr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mpc001 / end-to-end-lipreading
View on GitHub
Pytorch code for End-to-End Audiovisual Speech Recognition
☆183Nov 18, 2022Updated 3 years ago
afourast / deep_lip_reading
View on GitHub
Code and models for evaluating a state-of-the-art lip reading network
☆196Mar 24, 2023Updated 3 years ago
georgesterpu / avsr-tf1
View on GitHub
Audio-Visual Speech Recognition using Sequence to Sequence Models
☆84Jul 10, 2020Updated 6 years ago
VIPL-Audio-Visual-Speech-Understanding / learn-an-effective-lip-reading-model-without-pains
View on GitHub
The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…
☆168Sep 12, 2025Updated 10 months ago
mpc001 / Lipreading_using_Temporal_Convolutional_Networks
View on GitHub
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…
☆437May 18, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mpc001 / Visual_Speech_Recognition_for_Multiple_Languages
View on GitHub
Visual Speech Recognition for Multiple Languages
☆478Aug 17, 2023Updated 2 years ago
LUMIA-Group / Leveraging-Self-Supervised-Learning-for-AVSR
View on GitHub
Official PyTorch implementation of paper Leveraging Unimodal Self Supervised Learning for Multimodal Audio-Visual Speech Recognition (ACL…
☆67Jul 13, 2022Updated 4 years ago
zexupan / MuSE
View on GitHub
☆42Nov 22, 2024Updated last year
georgesterpu / Taris
View on GitHub
Transformer-based online speech recognition system with TensorFlow 2
☆26Jan 22, 2021Updated 5 years ago
facebookresearch / av_hubert
View on GitHub
A self-supervised learning framework for audio-visual speech
☆995Dec 7, 2023Updated 2 years ago
mpc001 / auto_avsr
View on GitHub
Auto-AVSR: Lip-Reading Sentences Project
☆428Jan 8, 2025Updated last year
burchim / AVEC
View on GitHub
[WACV 2023] Audio-Visual Efficient Conformer (AVEC) for Robust Speech Recognition
☆101Feb 21, 2023Updated 3 years ago
lin9x / AV-Sepformer
View on GitHub
☆65Jun 28, 2023Updated 3 years ago
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TaoRuijie / SEANet
View on GitHub
Code for Audio-Visual Target Speaker Extraction with Selective Auditory Attention (TASLP)
☆32Feb 28, 2025Updated last year
ahaliassos / raven
View on GitHub
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆82Feb 27, 2025Updated last year
dr-pato / audio_visual_speech_enhancement
View on GitHub
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
☆112Mar 19, 2024Updated 2 years ago
VIPL-Audio-Visual-Speech-Understanding / LipNet-PyTorch
View on GitHub
The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxi…
☆237Sep 21, 2022Updated 3 years ago
ajinkyaT / Lip_Reading_in_the_Wild_AVSR
View on GitHub
Audio-Visual Speech Recognition using Deep Learning
☆61Nov 14, 2018Updated 7 years ago
ms-dot-k / Multi-head-Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
☆27Mar 9, 2024Updated 2 years ago
aispeech-lab / advr-avss
View on GitHub
Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.
☆18Jul 11, 2022Updated 4 years ago
danmic / av-se
View on GitHub
Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
☆222Apr 16, 2023Updated 3 years ago
JackSyu / Discriminative-Multi-modality-Speech-Recognition
View on GitHub
TF code for our CVPR2020 paper "Discriminative Multi-modality Speech Recognition"
☆26Apr 27, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
tstafylakis / Lipreading-ResNet
View on GitHub
Torch code for using Residual Networks with LSTMs for Lipreading
☆99Oct 8, 2018Updated 7 years ago
VIPL-Audio-Visual-Speech-Understanding / VIPL-AVSU-Group
View on GitHub
Collection of works from VIPL-AVSU
☆50Updated this week
zexupan / reentry
View on GitHub
☆18Nov 22, 2024Updated last year
krantiparida / awesome-audio-visual
View on GitHub
A curated list of different papers and datasets in various areas of audio-visual processing
☆775Jan 30, 2024Updated 2 years ago
ms-dot-k / Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
☆22Apr 11, 2022Updated 4 years ago
prajwalkr / vtp
View on GitHub
Official Implementation of Visual Transformer Pooling for Lip reading
☆41Aug 8, 2022Updated 3 years ago
DanielMengLiu / DeepLip
View on GitHub
deep-learning based audio-visual lip bometrics
☆15May 9, 2023Updated 3 years ago
joonson / syncnet_trainer
View on GitHub
Disentangled Speech Embeddings using Cross-Modal Self-Supervision
☆167Apr 12, 2020Updated 6 years ago
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
david-gimeno / tailored-avsr
View on GitHub
Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"
☆15Feb 24, 2025Updated last year
lilianemomeni / KWS-Net
View on GitHub
Seeing Wake Words: Audio-visual Keyword Spotting
☆67Sep 16, 2020Updated 5 years ago
facebookresearch / VisualVoice
View on GitHub
Audio-Visual Speech Separation with Cross-Modal Consistency
☆250Jul 25, 2023Updated 2 years ago
cogmhear / avse_challenge
View on GitHub
COG-MHEAR Audio-Visual Speech Enhancement Challenge
☆48Feb 17, 2026Updated 5 months ago
joonson / syncnet_python
View on GitHub
Out of time: automated lip sync in the wild
☆894Apr 17, 2026Updated 3 months ago
Megamind22 / lstArab100words
View on GitHub
Deep Visual Speech Recognition in arabic words
☆17Oct 18, 2023Updated 2 years ago
lzuwei / end-to-end-multiview-lipreading
View on GitHub
End to End Multiview Lip Reading
☆10Jan 26, 2018Updated 8 years ago