prajwalkr/vtp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/prajwalkr/vtp)

prajwalkr / vtp

Official Implementation of Visual Transformer Pooling for Lip reading

☆41

Alternatives and similar repositories for vtp

Users that are interested in vtp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

prajwalkr / transpotter
View on GitHub
Official implementation of Transpotter, published in BMVC 2021
☆16Aug 6, 2022Updated 3 years ago
mpc001 / Visual_Speech_Recognition_for_Multiple_Languages
View on GitHub
Visual Speech Recognition for Multiple Languages
☆478Aug 17, 2023Updated 2 years ago
mpc001 / Lipreading_using_Temporal_Convolutional_Networks
View on GitHub
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…
☆438May 18, 2023Updated 3 years ago
ms-dot-k / LRW_ID
View on GitHub
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Paddi…
☆10Oct 12, 2023Updated 2 years ago
ms-dot-k / Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
☆22Apr 11, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
VIPL-Audio-Visual-Speech-Understanding / learn-an-effective-lip-reading-model-without-pains
View on GitHub
The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…
☆168Sep 12, 2025Updated 10 months ago
VIPL-Audio-Visual-Speech-Understanding / deep-face-speechreading
View on GitHub
Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We Read Speech Beyond the Lips? Rethinking RoI Select…
☆19Apr 12, 2021Updated 5 years ago
xing96 / MIM-lipreading
View on GitHub
Code and model for paper <Mutual Information Maximization for Effective Lip Reading>
☆19Sep 4, 2020Updated 5 years ago
mpc001 / auto_avsr
View on GitHub
Auto-AVSR: Lip-Reading Sentences Project
☆429Jan 8, 2025Updated last year
arxrean / LipRead-seq2seq
View on GitHub
An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.
☆10May 13, 2020Updated 6 years ago
facebookresearch / av_hubert
View on GitHub
A self-supervised learning framework for audio-visual speech
☆995Dec 7, 2023Updated 2 years ago
ms-dot-k / Multi-head-Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
☆27Mar 9, 2024Updated 2 years ago
afourast / deep_lip_reading
View on GitHub
Code and models for evaluating a state-of-the-art lip reading network
☆196Mar 24, 2023Updated 3 years ago
Sindhu-Hegde / gestsync
View on GitHub
Official code for the paper "GestSync: Determining who is speaking without a talking head" published at BMVC 2023
☆48Sep 1, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
burchim / AVEC
View on GitHub
[WACV 2023] Audio-Visual Efficient Conformer (AVEC) for Robust Speech Recognition
☆101Feb 21, 2023Updated 3 years ago
Exgc / OpenSR
View on GitHub
The official implementation of OpenSR (ACL2023 Oral)
☆17Nov 29, 2023Updated 2 years ago
JeongHun0716 / vsr-low
View on GitHub
Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)
☆17Mar 17, 2025Updated last year
smeetrs / deep_avsr
View on GitHub
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
☆244Feb 15, 2024Updated 2 years ago
JackSyu / Discriminative-Multi-modality-Speech-Recognition
View on GitHub
TF code for our CVPR2020 paper "Discriminative Multi-modality Speech Recognition"
☆26Apr 27, 2022Updated 4 years ago
ms-dot-k / Visual-Context-Attentional-GAN
View on GitHub
PyTorch implementation of "Lip to Speech Synthesis with Visual Context Attentional GAN" (NeurIPS2021)
☆25Mar 9, 2024Updated 2 years ago
mpc001 / end-to-end-lipreading
View on GitHub
Pytorch code for End-to-End Audiovisual Speech Recognition
☆183Nov 18, 2022Updated 3 years ago
Exgc / AVMuST-TED
View on GitHub
☆24Mar 30, 2024Updated 2 years ago
Sindhu-Hegde / multivsr
View on GitHub
Official code for the paper "Scaling Multilingual Visual Speech Recognition"
☆20Aug 15, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
YasserdahouML / visper
View on GitHub
ViSpeR: Multilingual Audio-Visual Speech Recognition
☆58Apr 17, 2025Updated last year
shleee47 / Sound-Source-Localization
View on GitHub
Sound Source Localization for AI Grand Challenge 2021
☆21Feb 7, 2022Updated 4 years ago
Bose / RAVEN
View on GitHub
☆20Oct 6, 2025Updated 9 months ago
VIPL-Audio-Visual-Speech-Understanding / VIPL-AVSU-Group
View on GitHub
Collection of works from VIPL-AVSU
☆50Updated this week
KrishnaDN / Keyword-Transformer
View on GitHub
Implementation of the paper "Keyword Transformer: A Self-Attention Model for Keyword Spotting"
☆23May 19, 2021Updated 5 years ago
danmic / av-se
View on GitHub
Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
☆222Apr 16, 2023Updated 3 years ago
lilianemomeni / KWS-Net
View on GitHub
Seeing Wake Words: Audio-visual Keyword Spotting
☆67Sep 16, 2020Updated 5 years ago
JeongHun0716 / VoxLRS-SA
View on GitHub
This repository contains the speaker labeled information of VoxCeleb2 and LRS3 audio-visual datasets. (AAAI 2025)
☆13Sep 6, 2024Updated last year
matthijsvk / TCDTIMITprocessing
View on GitHub
processing and extracting of face and mouth image files out of the TCDTIMIT database
☆47Sep 22, 2020Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
VIPL-Audio-Visual-Speech-Understanding / LipNet-PyTorch
View on GitHub
The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxi…
☆237Sep 21, 2022Updated 3 years ago
Spockuto / blockhash
View on GitHub
Speed you SHA. A different hash style.
☆13Jun 13, 2016Updated 10 years ago
PeterouZh / Deep_Generative_Models
View on GitHub
A collection of papers I am interested in.
☆29Apr 3, 2023Updated 3 years ago
facebookresearch / muavic
View on GitHub
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
☆403Sep 11, 2023Updated 2 years ago
my-yy / sl_icmr2022
View on GitHub
Code for "Self-Lifting: A Novel Framework For Unsupervised Voice-Face Association Learning,ICMR,2022"
☆15Oct 25, 2024Updated last year
MarshallT-99 / VALLR
View on GitHub
☆29Oct 1, 2025Updated 9 months ago
csiro-robotics / iSICE
View on GitHub
[CVPR2023] The official repository for paper "Learning Partial Correlation based Deep Visual Representation for Image Classification" To …
☆10Nov 21, 2023Updated 2 years ago