DanielMengLiu/AudioVisualLip

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DanielMengLiu/AudioVisualLip)

DanielMengLiu / AudioVisualLip

☆25

Alternatives and similar repositories for AudioVisualLip

Users that are interested in AudioVisualLip are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nii-yamagishilab / Attention_Backend_for_ASV
View on GitHub
Attention Backend for Aotumatic Speaker Verification with Multiple Enrollment Utterances
☆50Oct 27, 2022Updated 3 years ago
Bose / RAVEN
View on GitHub
☆20Oct 6, 2025Updated 9 months ago
zaocan666 / DyViSE
View on GitHub
Dynamic vision-guided speaker embedding for audio-visual speaker diarization
☆12Jul 5, 2022Updated 4 years ago
lin9x / AV-Sepformer
View on GitHub
☆65Jun 28, 2023Updated 3 years ago
tuanct1997 / Federated-Learning-ASR-based-on-wav2vec-2.0
View on GitHub
☆18Mar 13, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
TaoRuijie / AVCleanse
View on GitHub
ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'
☆44Oct 31, 2022Updated 3 years ago
jwr1995 / WD-TCN
View on GitHub
☆11Aug 5, 2022Updated 3 years ago
JeongHun0716 / Personalized-Lip-Reading
View on GitHub
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language (AAAI 2025)
☆24Jun 29, 2026Updated 3 weeks ago
umbertocappellazzo / Llama-AVSR
View on GitHub
Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…
☆64Jan 18, 2026Updated 6 months ago
guxm2021 / SVT_SpeechBrain
View on GitHub
[TOMM 2024] Automatic Lyric Transcription and Automatic Music Transcription from Multimodal Singing
☆28Aug 30, 2024Updated last year
itsyoavshalev / End-to-End-Lip-Synchronization-with-a-Temporal-AutoEncoder
View on GitHub
☆22Mar 31, 2022Updated 4 years ago
Jiang-Yidi / TS-TalkNet
View on GitHub
INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues
☆61May 29, 2023Updated 3 years ago
YUCHEN005 / MIR-GAN
View on GitHub
Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recogni…
☆16Jun 21, 2023Updated 3 years ago
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
DataoceanAI / CNVSRC2023Baseline
View on GitHub
Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)
☆23Apr 27, 2024Updated 2 years ago
Benjamin-Etheredge / mlp-mixer-keras
View on GitHub
☆14May 24, 2021Updated 5 years ago
JuanFMontesinos / VoViT
View on GitHub
VoViT: Low Latency Graph-based Audio-Visual VoiceSeparation Transformer
☆35Mar 18, 2023Updated 3 years ago
ASLP-lab / Smart-Glass-Challenge
View on GitHub
☆17Jun 16, 2026Updated last month
zengchang233 / CrossSinger
View on GitHub
The source code for the paper CrossSinger (asru2023)
☆18Oct 12, 2023Updated 2 years ago
firasl / BoCF
View on GitHub
Official implementation of 'Bag of Color Features For Color Constancy' (BoCF) accepted in IEEE Transactions on Image Processing (TIP) 202…
☆13Mar 7, 2022Updated 4 years ago
aleXiehta / Causal-SE
View on GitHub
Official Implementation of "Inference and Denoise: Causal Inference-based Neural Speech Enhancement"
☆28Feb 26, 2023Updated 3 years ago
YasserdahouML / visper
View on GitHub
ViSpeR: Multilingual Audio-Visual Speech Recognition
☆58Apr 17, 2025Updated last year
corot / turtlebot_arm
View on GitHub
The turtlebot_arm package provides bringup, description, and utilities for using the TurtleBot arm.
☆10Jun 11, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zhang-wy15 / Attack_practical_asv
View on GitHub
ICASSP 2021 accepted paper
☆20May 20, 2021Updated 5 years ago
SSTC-Challenge / SSTC2024_baseline_system
View on GitHub
☆12Jun 14, 2024Updated 2 years ago
deeplsd / Merkel-Podcast-Corpus
View on GitHub
This dataset is presented in the paper Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video…
☆12Sep 21, 2022Updated 3 years ago
tavihalperin / AV-sync
View on GitHub
Python implementation of the paper " Dynamic Temporal Alignment of Speech to Lips"
☆32May 16, 2019Updated 7 years ago
DanielMengLiu / DeepLip
View on GitHub
deep-learning based audio-visual lip bometrics
☆15May 9, 2023Updated 3 years ago
backspacetg / distilXLSR
View on GitHub
Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
☆13Mar 30, 2025Updated last year
MiukkaZh / MGT
View on GitHub
Learning Domain-Invariant Transformation for Speaker Verification.
☆11Jun 13, 2023Updated 3 years ago
zds-potato / multilingual-phonetic-sv
View on GitHub
☆10Dec 22, 2023Updated 2 years ago
TaoRuijie / Loss-Gated-Learning
View on GitHub
ICASSP 2022: 'Self-supervised Speaker Recognition with Loss-gated Learning'
☆92May 29, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
noiseux1523 / NIST-SRE-2019
View on GitHub
Score Normalization for NIST 2019 Speaker Recognition Evaluation
☆10Nov 8, 2019Updated 6 years ago
v-iashin / SparseSync
View on GitHub
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
☆56Jan 29, 2024Updated 2 years ago
choijeongsoo / av2av
View on GitHub
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
☆48Sep 6, 2024Updated last year
Aolius / semi-fst
View on GitHub
Code for ACL 2022 paper "Semi-Supervised Formality Style Transfer with Consistency Training".
☆17May 21, 2022Updated 4 years ago
ndpniraj / node-js-news-app-backend
View on GitHub
☆12Jul 8, 2021Updated 5 years ago
fgnt / speaker_reassignment
View on GitHub
Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment
☆14Feb 5, 2025Updated last year