allenye66 / Computer-Vision-Lip-Reading-2.0Links
A speech recognition system using 3D CNNs. The final model achieves 97.4% training accuracy and a 99.2% testing accuracy and the system can accurately recognize spoken words from a set of pre-defined words in real-time.
☆66Updated 2 years ago
Alternatives and similar repositories for Computer-Vision-Lip-Reading-2.0
Users that are interested in Computer-Vision-Lip-Reading-2.0 are comparing it to the libraries listed below
Sorting:
- A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.☆93Updated 6 months ago
- [Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization☆60Updated 10 months ago
- Official Code implementation for the ICLR paper "LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading"☆85Updated last year
- PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)☆70Updated last year
- The repo contains an audio emotion detection model, facial emotion detection model, and a model that combines both these models to predic…☆92Updated 2 years ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆199Updated 6 months ago
- [NeurIPS 2024] This is the official repo of the paper "Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Li…☆135Updated last year
- Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.☆37Updated 8 months ago
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆165Updated 10 months ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆47Updated last year
- The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)☆88Updated 10 months ago
- Visual Speech Recognition for Multiple Languages☆458Updated 2 years ago
- [AAAI 2025] VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization☆53Updated last year
- SUTD 50.039 Deep Learning Course Project (2022 Spring)☆81Updated 2 years ago
- Auto-AVSR: Lip-Reading Sentences Project☆402Updated last year
- Dynamic and static models for real-time facial emotion recognition☆179Updated last year
- Audio deepfake detection sytem on CNN☆66Updated 2 years ago
- a PyTorch implementation of Lip2Wav☆50Updated 3 years ago
- [CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg☆261Updated 10 months ago
- Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …☆32Updated 9 months ago
- Speech Emotion Detection using SVM, Decision Tree, Random Forest, MLP, CNN with different architectures☆39Updated 2 years ago
- official code for CVPR'24 paper Diff-BGM☆71Updated last year
- ☆63Updated 7 months ago
- 😎 Awesome lists about Speech Emotion Recognition☆101Updated last year
- ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'☆444Updated 2 years ago
- [AAAI 2024] stle2talker - Official PyTorch Implementation☆51Updated 6 months ago
- Speech Emotion Recognition☆43Updated 2 years ago
- Deep Visual Speech Recognition in arabic words☆16Updated 2 years ago
- ☆48Updated 2 years ago
- An implementation of Speech Emotion Recognition, based on HuBERT model, training with PyTorch and HuggingFace framework, and fine-tuning …☆33Updated 3 years ago