facebookresearch/av_hubert

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/av_hubert)

facebookresearch / av_hubert

A self-supervised learning framework for audio-visual speech

☆995

Alternatives and similar repositories for av_hubert

Users that are interested in av_hubert are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mpc001 / Visual_Speech_Recognition_for_Multiple_Languages
View on GitHub
Visual Speech Recognition for Multiple Languages
☆478Aug 17, 2023Updated 2 years ago
ahaliassos / raven
View on GitHub
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆82Feb 27, 2025Updated last year
smeetrs / deep_avsr
View on GitHub
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
☆244Feb 15, 2024Updated 2 years ago
facebookresearch / muavic
View on GitHub
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
☆403Sep 11, 2023Updated 2 years ago
mpc001 / auto_avsr
View on GitHub
Auto-AVSR: Lip-Reading Sentences Project
☆428Jan 8, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
mpc001 / Lipreading_using_Temporal_Convolutional_Networks
View on GitHub
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…
☆437May 18, 2023Updated 3 years ago
Sxjdwang / TalkLip
View on GitHub
☆429Nov 1, 2023Updated 2 years ago
LUMIA-Group / Leveraging-Self-Supervised-Learning-for-AVSR
View on GitHub
Official PyTorch implementation of paper Leveraging Unimodal Self Supervised Learning for Multimodal Audio-Visual Speech Recognition (ACL…
☆67Jul 13, 2022Updated 4 years ago
facebookresearch / speech-resynthesis
View on GitHub
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-S…
☆416Aug 29, 2023Updated 2 years ago
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
prajwalkr / vtp
View on GitHub
Official Implementation of Visual Transformer Pooling for Lip reading
☆41Aug 8, 2022Updated 3 years ago
s3prl / s3prl
View on GitHub
Self-Supervised Speech Pre-training and Representation Learning Toolkit
☆2,558Mar 12, 2026Updated 4 months ago
ms-dot-k / Multi-head-Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
☆27Mar 9, 2024Updated 2 years ago
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
roudimit / whisper-flamingo
View on GitHub
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
☆210Jul 29, 2025Updated 11 months ago
facebookresearch / textlesslib
View on GitHub
Library for Textless Spoken Language Processing
☆559Aug 29, 2023Updated 2 years ago
facebookresearch / VisualVoice
View on GitHub
Audio-Visual Speech Separation with Cross-Modal Consistency
☆250Jul 25, 2023Updated 2 years ago
krantiparida / awesome-audio-visual
View on GitHub
A curated list of different papers and datasets in various areas of audio-visual processing
☆775Jan 30, 2024Updated 2 years ago
facebookresearch / AudioMAE
View on GitHub
This repo hosts the code and models of "Masked Autoencoders that Listen".
☆673Apr 5, 2024Updated 2 years ago
joonson / syncnet_python
View on GitHub
Out of time: automated lip sync in the wild
☆894Apr 17, 2026Updated 3 months ago
danmic / av-se
View on GitHub
Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
☆222Apr 16, 2023Updated 3 years ago
microsoft / UniSpeech
View on GitHub
UniSpeech - Large Scale Self-Supervised Learning for Speech
☆486Apr 5, 2024Updated 2 years ago
Exgc / AVMuST-TED
View on GitHub
☆24Mar 30, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year
VIPL-Audio-Visual-Speech-Understanding / learn-an-effective-lip-reading-model-without-pains
View on GitHub
The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…
☆168Sep 12, 2025Updated 10 months ago
facebookresearch / AudioDec
View on GitHub
An Open-source Streaming High-fidelity Neural Audio Codec
☆510Mar 4, 2025Updated last year
facebookresearch / CPC_audio
View on GitHub
An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.
☆374Oct 12, 2021Updated 4 years ago
facebookresearch / voxpopuli
View on GitHub
A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation
☆574Apr 2, 2023Updated 3 years ago
ms-dot-k / LRW_ID
View on GitHub
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Paddi…
☆10Oct 12, 2023Updated 2 years ago
ahaliassos / usr
View on GitHub
Official implementation of USR (NeurIPS 2024)
☆40Dec 21, 2024Updated last year
ZhangXInFD / SpeechTokenizer
View on GitHub
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…
☆658Jun 9, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
KevinMIN95 / StyleSpeech
View on GitHub
Official implementation of Meta-StyleSpeech and StyleSpeech
☆253Feb 9, 2022Updated 4 years ago
atosystem / SpeechCLIP
View on GitHub
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022
☆120Nov 25, 2022Updated 3 years ago
jasonppy / word-discovery
View on GitHub
Word Discovery in Visually Grounded, Self-Supervised Speech Models
☆27Dec 4, 2023Updated 2 years ago
Sally-SH / VSP-LLM
View on GitHub
☆346Mar 17, 2025Updated last year
SpeechColab / GigaSpeech
View on GitHub
Large, modern dataset for speech recognition
☆731Feb 26, 2024Updated 2 years ago
TencentGameMate / chinese_speech_pretrain
View on GitHub
chinese speech pretrained models
☆1,211Aug 23, 2024Updated last year
Srijith-rkr / Whispering-LLaMA
View on GitHub
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
☆271May 19, 2024Updated 2 years ago