nguyenvulebinh/AV-HuBERT-S2S

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nguyenvulebinh/AV-HuBERT-S2S)

nguyenvulebinh / AV-HuBERT-S2S

Huggingface Implementation of AV-HuBERT on the MuAViC Dataset

☆19

Alternatives and similar repositories for AV-HuBERT-S2S

Users that are interested in AV-HuBERT-S2S are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MCoRec / mcorec_baseline
View on GitHub
CHiME-9 Task 1 - MCoRec baseline
☆28Jan 13, 2026Updated 6 months ago
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
JeongHun0716 / vsr-low
View on GitHub
Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)
☆17Mar 17, 2025Updated last year
umbertocappellazzo / Llama-AVSR
View on GitHub
Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…
☆64Jan 18, 2026Updated 6 months ago
ms-dot-k / AVSR
View on GitHub
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…
☆23Apr 3, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
JeongHun0716 / MMS-LLaMA
View on GitHub
Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…
☆48Jun 12, 2025Updated last year
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago
Sindhu-Hegde / multivsr
View on GitHub
Official code for the paper "Scaling Multilingual Visual Speech Recognition"
☆20Aug 15, 2025Updated 11 months ago
HumanMLLM / CoGenAV
View on GitHub
☆64Jul 1, 2025Updated last year
Chris10M / Lip2Speech
View on GitHub
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
☆95Jul 23, 2025Updated last year
YasserdahouML / visper
View on GitHub
ViSpeR: Multilingual Audio-Visual Speech Recognition
☆59Apr 17, 2025Updated last year
nikhilraghav29 / diarizen-tutorial
View on GitHub
DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline.
☆22Apr 24, 2026Updated 3 months ago
lu-wo / whisbert
View on GitHub
babyLM WhisBERT code
☆19May 27, 2024Updated 2 years ago
sungnyun / avsr-temporal-dynamics
View on GitHub
(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
☆13Oct 22, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ahaliassos / usr
View on GitHub
Official implementation of USR (NeurIPS 2024)
☆40Dec 21, 2024Updated last year
ahaliassos / raven
View on GitHub
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆82Feb 27, 2025Updated last year
JeongHun0716 / zero-avsr
View on GitHub
Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …
☆37May 11, 2025Updated last year
JeongHun0716 / e-mvsr
View on GitHub
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)
☆20Mar 17, 2025Updated last year
KAIST-AILab / SyncVSR
View on GitHub
[Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
☆61Mar 26, 2025Updated last year
DanielMengLiu / AudioVisualLip
View on GitHub
☆25Feb 20, 2024Updated 2 years ago
facebookresearch / muavic
View on GitHub
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
☆404Sep 11, 2023Updated 2 years ago
chimechallenge / C8DASR-Baseline-NeMo
View on GitHub
NeMo: a toolkit for conversational AI
☆13May 4, 2024Updated 2 years ago
umbertocappellazzo / Omni-AVSR
View on GitHub
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…
☆38Mar 10, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
FesianXu / LipNet_ChineseWordsClassification
View on GitHub
Chinese words classification using lipnet with pytorch
☆40Nov 18, 2019Updated 6 years ago
SarthakYadav / audiomae-plusplus-official
View on GitHub
Official repository for the paper "AudioMAE++: learning better masked audio representations with SwiGLU FFNs"
☆15Apr 30, 2026Updated 2 months ago
Exgc / AVMuST-TED
View on GitHub
☆24Mar 30, 2024Updated 2 years ago
fgnt / meeteval
View on GitHub
MeetEval - A meeting transcription evaluation toolkit
☆172Jan 27, 2026Updated 6 months ago
chenqi008 / V2C
View on GitHub
Pytorch implementation for “V2C: Visual Voice Cloning”
☆35Jan 28, 2023Updated 3 years ago
ajinkyaT / Lip_Reading_in_the_Wild_AVSR
View on GitHub
Audio-Visual Speech Recognition using Deep Learning
☆61Nov 14, 2018Updated 7 years ago
nguyenvulebinh / spoken-norm
View on GitHub
Transformation spoken text to written text
☆31May 14, 2024Updated 2 years ago
JeongHun0716 / Personalized-Lip-Reading
View on GitHub
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language (AAAI 2025)
☆24Jun 29, 2026Updated last month
SpeechEE / SpeechEE
View on GitHub
☆11Aug 20, 2025Updated 11 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
GalaxyCong / EmoDubber
View on GitHub
[CVPR 2025] Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.
☆38Jun 3, 2025Updated last year
ms-dot-k / Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
☆22Apr 11, 2022Updated 4 years ago
ski-net / lipnet
View on GitHub
LipNet with gluon
☆23Nov 22, 2022Updated 3 years ago
boun-tabi / SQuAD-TR
View on GitHub
☆11Jun 8, 2024Updated 2 years ago
VincentHancoder / AToM
View on GitHub
The official implementation of work "AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward".
☆19Mar 25, 2025Updated last year
amazon-science / avgen-eval-toolkit
View on GitHub
☆20Feb 5, 2026Updated 5 months ago
roudimit / whisper-flamingo
View on GitHub
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
☆210Jul 29, 2025Updated last year