YUCHEN005/UniVPM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YUCHEN005/UniVPM)

YUCHEN005 / UniVPM

Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"

☆28

Alternatives and similar repositories for UniVPM

Users that are interested in UniVPM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

YUCHEN005 / UNA-GAN
View on GitHub
Code for paper "Unsupervised Noise adaptation using Data Simulation"
☆14May 16, 2024Updated 2 years ago
YUCHEN005 / GILA
View on GitHub
Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"
☆18Jun 21, 2023Updated 3 years ago
YUCHEN005 / Gradient-Remedy
View on GitHub
Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"
☆21May 24, 2023Updated 3 years ago
YUCHEN005 / NASE
View on GitHub
Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
☆89Jun 10, 2024Updated 2 years ago
Hypotheses-Paradise / Hypo2Trans
View on GitHub
Single-blind supplementary materials for NeurIPS 2023 submission
☆94Oct 30, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
david-gimeno / tailored-avsr
View on GitHub
Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"
☆15Feb 24, 2025Updated last year
YUCHEN005 / STAR-Adapt
View on GitHub
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
☆241May 24, 2024Updated 2 years ago
Hypotheses-Paradise / UADF
View on GitHub
☆17May 5, 2024Updated 2 years ago
sungnyun / avsr-temporal-dynamics
View on GitHub
(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
☆13Oct 22, 2024Updated last year
YUCHEN005 / GenTranslate
View on GitHub
Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
☆199Jul 22, 2024Updated 2 years ago
junhwanjang / visemenet-inference
View on GitHub
3D Avatar Lip Synchronization from speech (JALI based face-rigging)
☆83Apr 13, 2022Updated 4 years ago
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago
chaufanglin / Normal2Whisper
View on GitHub
Implementation of "Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation"
☆14Oct 31, 2024Updated last year
zhengmidon / singaligner
View on GitHub
a compact audio-to-phoneme aligner for singing voice
☆12Jan 17, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
shirley-wu / daco
View on GitHub
[NeurIPS 2024 D&B Track] DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
☆14Mar 5, 2025Updated last year
ms-dot-k / AVSR
View on GitHub
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…
☆23Apr 3, 2024Updated 2 years ago
wujinzhong / Wav2Lip_TensorRT
View on GitHub
☆29Oct 1, 2023Updated 2 years ago
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
huckiyang / Interspeech23-Tutorial-Para-Efficient-Cross-Modal-Tutorial
View on GitHub
Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling
☆15Oct 9, 2023Updated 2 years ago
swagshaw / Rainbow-Keywords
View on GitHub
Rainbow Keywords - Official PyTorch Implementation
☆14Jun 27, 2024Updated 2 years ago
zouharvi / pwesuite
View on GitHub
Suite for phonetic word embeddings, especially their evaluation and baseline models.
☆38Mar 3, 2025Updated last year
jetfontanilla / azure-viseme-json
View on GitHub
Example code on how to generate viseme json
☆14Feb 23, 2023Updated 3 years ago
YoungSeng / ReprGesture
View on GitHub
The ReprGesture entry to the GENEA Challenge 2022 (IMCI 2022)
☆16Nov 8, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
aflr-archive / viseme-to-video
View on GitHub
Creates video from TTS output and viseme images.
☆16Jun 18, 2022Updated 4 years ago
zhangnn520 / digitalAvatarRealtime
View on GitHub
基于DINet的推理服务，推理视频流和视频
☆17Nov 8, 2023Updated 2 years ago
liukuangxiangzi / audio2viseme
View on GitHub
The code generate phoneme from audio features.
☆34Jun 15, 2021Updated 5 years ago
andreamad8 / ToDCL
View on GitHub
Continual Learning for Task-Oriented Dialogue Systems
☆30Apr 21, 2022Updated 4 years ago
choijeongsoo / lip2speech-unit
View on GitHub
[Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units
☆47Oct 26, 2024Updated last year
Magicboomliu / Viseme-Classification
View on GitHub
A pipeline from Dataset Gathering,Data annotations, Model training,Model Evaluation for viseme (visual sound phoneme) classification
☆15Jan 19, 2021Updated 5 years ago
zhang-wy15 / Attack_practical_asv
View on GitHub
ICASSP 2021 accepted paper
☆20May 20, 2021Updated 5 years ago
NKU-HLT / PB-DSR
View on GitHub
[Interspeech 2024] Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
☆14Nov 28, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
francescotonini / human-gaze-target-detection-transformer
View on GitHub
An implementation of the paper "End-to-End Human-Gaze-Target Detection with Transformers"
☆20Updated this week
archiki / Robust-E2E-ASR
View on GitHub
This repository contains the code for our upcoming paper An Investigation of End-to-End Models for Robust Speech Recognition at ICASSP 20…
☆49Dec 25, 2024Updated last year
gustavo-beck / wavebender-gan
View on GitHub
☆25Sep 27, 2022Updated 3 years ago
aranciokov / FSMMDA_VideoRetrieval
View on GitHub
☆10Nov 23, 2023Updated 2 years ago
choijeongsoo / av2av
View on GitHub
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
☆48Sep 6, 2024Updated last year
gunnxx / indonesian-mt-data
View on GitHub
Benchmarking Multidomain English-Indonesian Machine Translation
☆16Dec 19, 2020Updated 5 years ago
siddharthverma314 / chai-naacl-2022
View on GitHub
Code for CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning
☆23Jul 12, 2022Updated 4 years ago