georgesterpu/avsr-tf1

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/georgesterpu/avsr-tf1)

georgesterpu / avsr-tf1

Audio-Visual Speech Recognition using Sequence to Sequence Models

☆84

Alternatives and similar repositories for avsr-tf1

Users that are interested in avsr-tf1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ajinkyaT / Lip_Reading_in_the_Wild_AVSR
View on GitHub
Audio-Visual Speech Recognition using Deep Learning
☆61Nov 14, 2018Updated 7 years ago
georgesterpu / Taris
View on GitHub
Transformer-based online speech recognition system with TensorFlow 2
☆26Jan 22, 2021Updated 5 years ago
smeetrs / deep_avsr
View on GitHub
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
☆244Feb 15, 2024Updated 2 years ago
mpc001 / end-to-end-lipreading
View on GitHub
Pytorch code for End-to-End Audiovisual Speech Recognition
☆183Nov 18, 2022Updated 3 years ago
georgesterpu / pyVSR
View on GitHub
Python toolkit for Visual Speech Recognition
☆37Jun 10, 2020Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
afourast / deep_lip_reading
View on GitHub
Code and models for evaluating a state-of-the-art lip reading network
☆196Mar 24, 2023Updated 3 years ago
tstafylakis / Lipreading-ResNet
View on GitHub
Torch code for using Residual Networks with LSTMs for Lipreading
☆99Oct 8, 2018Updated 7 years ago
LeeYongHyeok / DCM_vgg_transformer
View on GitHub
Dual cross modality attention audio-visual speech recognition model based on vgg transformer with hybrid CTC/attention architecture using…
☆14Jul 2, 2020Updated 6 years ago
VIPL-Audio-Visual-Speech-Understanding / LRW1000--CAS-VSR-W1k
View on GitHub
DenseNet3D Model In "LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild", https://arxiv.org/abs/1810.069…
☆123Mar 13, 2026Updated 4 months ago
lzuwei / ip-avsr
View on GitHub
Audio Visual Speech Recognition
☆23Aug 9, 2017Updated 8 years ago
sailordiary / LipNet-PyTorch
View on GitHub
"LipNet: End-to-End Sentence-level Lipreading" in PyTorch
☆70Sep 9, 2019Updated 6 years ago
dr-pato / audio_visual_speech_enhancement
View on GitHub
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
☆112Mar 19, 2024Updated 2 years ago
artem179 / WLAS
View on GitHub
The implementation of 'Watch, Listen, Attend and Spell’ (WLAS) network that learns to transcribe videos of mouth motion to character on p…
☆11Mar 23, 2018Updated 8 years ago
lzuwei / end-to-end-multiview-lipreading
View on GitHub
End to End Multiview Lip Reading
☆10Jan 26, 2018Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ms-dot-k / LRW_ID
View on GitHub
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Paddi…
☆10Oct 12, 2023Updated 2 years ago
DanielMengLiu / DeepLip
View on GitHub
deep-learning based audio-visual lip bometrics
☆15May 9, 2023Updated 3 years ago
vskadandale / vocalist
View on GitHub
Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
☆73Apr 7, 2024Updated 2 years ago
ms-dot-k / Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
☆22Apr 11, 2022Updated 4 years ago
sungnyun / avsr-temporal-dynamics
View on GitHub
(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
☆13Oct 22, 2024Updated last year
mpc001 / Lipreading_using_Temporal_Convolutional_Networks
View on GitHub
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…
☆437May 18, 2023Updated 3 years ago
liu12366262626 / AlignVSR
View on GitHub
Visual Speech Recongnition
☆21Dec 24, 2024Updated last year
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
lilianemomeni / KWS-Net
View on GitHub
Seeing Wake Words: Audio-visual Keyword Spotting
☆67Sep 16, 2020Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
pandeydivesh15 / AVSR-Deep-Speech
View on GitHub
Google Summer of Code 2017 Project: Development of Speech Recognition Module for Red Hen Lab
☆44Aug 29, 2017Updated 8 years ago
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
matthijsvk / TCDTIMITprocessing
View on GitHub
processing and extracting of face and mouth image files out of the TCDTIMIT database
☆47Sep 22, 2020Updated 5 years ago
JackSyu / Discriminative-Multi-modality-Speech-Recognition
View on GitHub
TF code for our CVPR2020 paper "Discriminative Multi-modality Speech Recognition"
☆26Apr 27, 2022Updated 4 years ago
mispchallenge / MISP-ICME-AVSR
View on GitHub
☆17Jan 1, 2024Updated 2 years ago
joseph-zhong / LipReading
View on GitHub
Speech Recognition without audio input
☆144May 5, 2026Updated 2 months ago
david-gimeno / tailored-avsr
View on GitHub
Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"
☆15Feb 24, 2025Updated last year
arxrean / LipRead-seq2seq
View on GitHub
An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.
☆10May 13, 2020Updated 6 years ago
afperezm / acoustic-images-distillation
View on GitHub
Code for the paper: Audio-Visual Model Distillation Using Acoustic Images
☆21Mar 24, 2023Updated 3 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
valiakon / MultimodalAnalysis_SpeakerDiarization
View on GitHub
The project tries to solve a speaker diarization problem using audio features, face recognition and video feature extraction from face im…
☆16Feb 10, 2019Updated 7 years ago
Joovvhan / MelNet
View on GitHub
PyTorch implementation of MelNet
☆10Aug 24, 2019Updated 6 years ago
astorfi / lip-reading-deeplearning
View on GitHub
Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
☆1,905Nov 7, 2022Updated 3 years ago
SSahuDS / Lipreading-Using-Mutimodal-Speech-Recognition
View on GitHub
Multimodal Speech Recognition for phoneme level prediction using Audio-Visual data from TCDTIMIT dataset implementing RNNs with LSTMs for…
☆15Jul 27, 2023Updated 2 years ago
andrewowens / multisensory
View on GitHub
Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
☆225Jul 17, 2019Updated 7 years ago
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago