VIPL-Audio-Visual-Speech-Understanding/VIPL-AVSU-Group

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/VIPL-Audio-Visual-Speech-Understanding/VIPL-AVSU-Group)

VIPL-Audio-Visual-Speech-Understanding / VIPL-AVSU-Group

Collection of works from VIPL-AVSU

☆50

Alternatives and similar repositories for VIPL-AVSU-Group

Users that are interested in VIPL-AVSU-Group are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xing96 / MIM-lipreading
View on GitHub
Code and model for paper <Mutual Information Maximization for Effective Lip Reading>
☆19Sep 4, 2020Updated 5 years ago
VIPL-Audio-Visual-Speech-Understanding / learn-an-effective-lip-reading-model-without-pains
View on GitHub
The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…
☆168Sep 12, 2025Updated 10 months ago
deeplsd / Merkel-Podcast-Corpus
View on GitHub
This dataset is presented in the paper Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video…
☆12Sep 21, 2022Updated 3 years ago
mysee1989 / GraphJigsaw
View on GitHub
Code for the paper: Graph Jigsaw Learning for Cartoon Face Recognition
☆10Jul 1, 2022Updated 4 years ago
mispchallenge / misp2021_baseline
View on GitHub
☆29Jun 15, 2022Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
VIPL-Audio-Visual-Speech-Understanding / LRW1000--CAS-VSR-W1k
View on GitHub
DenseNet3D Model In "LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild", https://arxiv.org/abs/1810.069…
☆123Mar 13, 2026Updated 4 months ago
jingyunx / Deformation-Flow-Based-Two-stream-Network-for-Lip-Reading
View on GitHub
☆15Dec 11, 2021Updated 4 years ago
hkzhang95 / Awesome-CV-bibfiles
View on GitHub
A collection of bibfiles related to computer vision for efficiency
☆17May 29, 2021Updated 5 years ago
smeetrs / deep_avsr
View on GitHub
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
☆244Feb 15, 2024Updated 2 years ago
celebrity-audio-collection / videoprocess
View on GitHub
CN-Celeb, a large-scale Chinese celebrities dataset published by Center for Speech and Language Technology (CSLT) at Tsinghua University.
☆80Nov 9, 2019Updated 6 years ago
mpc001 / Lipreading_using_Temporal_Convolutional_Networks
View on GitHub
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…
☆437May 18, 2023Updated 3 years ago
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
arxrean / LipRead-seq2seq
View on GitHub
An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.
☆10May 13, 2020Updated 6 years ago
XuyangGuo / STD-GAN
View on GitHub
Instance-level Facial Attributes Editing (CVIU 2021)
☆15Jul 19, 2022Updated 4 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
mpc001 / end-to-end-lipreading
View on GitHub
Pytorch code for End-to-End Audiovisual Speech Recognition
☆183Nov 18, 2022Updated 3 years ago
VIPL-Audio-Visual-Speech-Understanding / LipNet-PyTorch
View on GitHub
The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxi…
☆237Sep 21, 2022Updated 3 years ago
Rock-100 / Detecting-Text-in-Natural-Image-with-Connectionist-Text-Proposal-Network
View on GitHub
Implementation of Detecting Text in Natural Image with Connectionist Text Proposal Network (Zhi Tian, et)
☆15Apr 18, 2019Updated 7 years ago
prajwalkr / vtp
View on GitHub
Official Implementation of Visual Transformer Pooling for Lip reading
☆41Aug 8, 2022Updated 3 years ago
whzikaros / g2pL
View on GitHub
The implementation of g2pL with a new open dataset.
☆16May 14, 2023Updated 3 years ago
revsic / torch-retriever-vc
View on GitHub
PyTorch implementation of Retriever: Learning Content-Style Representation
☆12Jan 27, 2023Updated 3 years ago
JuanFMontesinos / VoViT
View on GitHub
VoViT: Low Latency Graph-based Audio-Visual VoiceSeparation Transformer
☆35Mar 18, 2023Updated 3 years ago
fashion-challenge / fashion-challenge.github.io
View on GitHub
☆24Sep 17, 2018Updated 7 years ago
SpeechColab / PySpeechColab
View on GitHub
A library of speech gadgets.
☆15Oct 15, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
LynnHo / AttGAN-Cartoon-Tensorflow
View on GitHub
☆17Dec 8, 2020Updated 5 years ago
nxsEdson / MLCR
View on GitHub
Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)
☆66Dec 1, 2019Updated 6 years ago
LiuDongyang6 / FCFD
View on GitHub
Official implementation of the paper "Function-Consistent Feature Distillation" (ICLR 2023)
☆30Jul 5, 2023Updated 3 years ago
Yuanbo2020 / Audio-Visual-VAD
View on GitHub
☆13May 9, 2022Updated 4 years ago
ondrejklejch / learning_to_adapt
View on GitHub
Coordinate-wise meta-learner for speaker adaptation of ASR models.
☆20Dec 30, 2019Updated 6 years ago
MartaYang / LPE
View on GitHub
Codes for the WACV 2023 paper: "Semantic Guided Latent Parts Embedding for Few-Shot Learning"
☆141Jan 28, 2023Updated 3 years ago
DataoceanAI / CNVSRC2023Baseline
View on GitHub
Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)
☆23Apr 27, 2024Updated 2 years ago
spkgyk / RTFS-Net
View on GitHub
Official code release for "RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation", accepted ICLR 2024
☆51Oct 14, 2025Updated 9 months ago
michaelzhang-ai / vid2vid
View on GitHub
A modified version of vid2vid for Speech2Video, Text2Video Paper
☆35Jun 4, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
wtomin / UncertainEmotion
View on GitHub
☆10Feb 24, 2022Updated 4 years ago
jiapuwang / TDN-Triplet-Distributor-Network-for-Knowledge-Graph-Completion
View on GitHub
☆137May 31, 2023Updated 3 years ago
stoneMo / MGN
View on GitHub
Official implementation for MGN
☆20Dec 22, 2022Updated 3 years ago
seanexp / LipMovement
View on GitHub
Detects lip movement and check if a person is speaking
☆19May 4, 2018Updated 8 years ago
dr-pato / audio_visual_speech_enhancement
View on GitHub
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
☆112Mar 19, 2024Updated 2 years ago
jiapuwang / QDN-A-Quadruplet-Distributor-Network-for-Temporal-Knowledge-Graph-Completion
View on GitHub
☆136Sep 24, 2024Updated last year
youcaiSUN / MuSe-Wild_2020
View on GitHub
☆12Aug 24, 2020Updated 5 years ago