VIPL-Audio-Visual-Speech-Understanding / AVSU-VIPL

Collection of works from VIPL-AVSU

☆43

Alternatives and similar repositories for AVSU-VIPL:

Users that are interested in AVSU-VIPL are comparing it to the libraries listed below

JackSyu / Discriminative-Multi-modality-Speech-Recognition
TF code for our CVPR2020 paper "Discriminative Multi-modality Speech Recognition"
☆25Updated 3 years ago
xing96 / MIM-lipreading
Code and model for paper <Mutual Information Maximization for Effective Lip Reading>
☆20Updated 4 years ago
fuankarion / active-speakers-context
Code for the Active Speakers in Context Paper (CVPR2020)
☆54Updated 3 years ago
NirHeaven / D3D
The proposed method in LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild
☆25Updated 6 years ago
arxrean / LipRead-seq2seq
An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.
☆10Updated 4 years ago
VIPL-Audio-Visual-Speech-Understanding / deep-face-speechreading
Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We Read Speech Beyond the Lips? Rethinking RoI Select…
☆17Updated 4 years ago
uark-cviu / Right2Talk
[ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach
☆20Updated 3 years ago
LUMIA-Group / Leveraging-Self-Supervised-Learning-for-AVSR
Official PyTorch implementation of paper Leveraging Unimodal Self Supervised Learning for Multimodal Audio-Visual Speech Recognition (ACL…
☆65Updated 2 years ago
tuanchien / asd
Active Speaker Detection
☆19Updated 4 years ago
VIPL-Audio-Visual-Speech-Understanding / Lipreading-DenseNet3D
DenseNet3D Model In "LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild", https://arxiv.org/abs/1810.069…
☆118Updated 4 years ago
okankop / ASDNet
Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset
☆61Updated 3 years ago
joonson / syncnet_trainer
Disentangled Speech Embeddings using Cross-Modal Self-Supervision
☆159Updated 5 years ago
vskadandale / vocalist
Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
☆66Updated last year
ms-dot-k / Multi-head-Visual-Audio-Memory
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
☆27Updated last year
afourast / avobjects
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
☆113Updated 4 years ago
VIPL-Audio-Visual-Speech-Understanding / learn-an-effective-lip-reading-model-without-pains
The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…
☆159Updated last year
zcxu-eric / Ego4d_TalkNet_ASD
☆20Updated 3 years ago
MohammedAlghamdi / talking-heads-acm-mm
Talking Head from Speech Audio using a Pre-trained Image Generator
☆23Updated last year
walkoncross / voxceleb2-download-zyf
Tools for downloading VoxCeleb2 dataset
☆29Updated last year
SheldonTsui / SepStereo_ECCV2020
Codebase for the paper "Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation" (ECCV2020)
☆72Updated 4 years ago
v-iashin / SparseSync
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
☆52Updated last year
YapengTian / CCOL-CVPR21
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
☆24Updated 3 years ago
jingyunx / Deformation-Flow-Based-Two-stream-Network-for-Lip-Reading
☆16Updated 3 years ago
lelechen63 / 3d_gan
☆35Updated 6 years ago
lelechen63 / talking-head-generation-survey
Official github repo for paper "What comprises a good talking-head video generation?: A Survey and Benchmark"
☆90Updated 2 years ago
changil / avspeech-downloader
AVSpeech downloader
☆67Updated 6 years ago
joannahong / Lip2Wav-pytorch
a PyTorch implementation of Lip2Wav
☆50Updated 2 years ago
Jiang-Yidi / TS-TalkNet
INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues
☆51Updated last year
zfang399 / AlignNet
AlignNet: A Unifying Approach to Audio-Visual Alignment (WACV 2020)
☆33Updated 4 years ago
cyrta / voxceleb
mirror of VoxCeleb dataset - a large-scale speaker identification dataset
☆71Updated 5 years ago