☆64Jul 1, 2025Updated 9 months ago
Alternatives and similar repositories for CoGenAV
Users that are interested in CoGenAV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ViSpeR: Multilingual Audio-Visual Speech Recognition☆56Apr 17, 2025Updated 11 months ago
- Audio-Visual Speech Recognition☆21Jul 7, 2025Updated 9 months ago
- (ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation☆16Apr 29, 2025Updated 11 months ago
- Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…☆34Mar 10, 2026Updated last month
- Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"☆14Feb 24, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …☆33May 11, 2025Updated 11 months ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆47Oct 26, 2024Updated last year
- HumanOmni☆223Mar 10, 2025Updated last year
- The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Paddi…☆10Oct 12, 2023Updated 2 years ago
- [ICME 2025] DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation☆24Mar 25, 2025Updated last year
- [Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization☆61Mar 26, 2025Updated last year
- PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…☆21Apr 3, 2024Updated 2 years ago
- ☆154Jul 31, 2025Updated 8 months ago
- ☆17Jan 1, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆46Sep 6, 2024Updated last year
- Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…☆61Jan 18, 2026Updated 2 months ago
- WildVSR☆21Dec 13, 2023Updated 2 years ago
- Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"☆28Jun 21, 2023Updated 2 years ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 9 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- Auto-AVSR: Lip-Reading Sentences Project☆409Jan 8, 2025Updated last year
- Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)☆20Mar 17, 2025Updated last year
- An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.☆10May 13, 2020Updated 5 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Implementation of Google's USM speech model in Pytorch☆35Mar 22, 2026Updated 2 weeks ago
- [WACV 2024] Code release for "VEATIC: Video-based Emotion and Affect Tracking in Context Dataset"☆21Jan 14, 2026Updated 2 months ago
- JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers☆17Jul 21, 2025Updated 8 months ago
- Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)☆22Apr 27, 2024Updated last year
- Official implementation of USR (NeurIPS 2024)☆39Dec 21, 2024Updated last year
- ☆17Jul 22, 2024Updated last year
- [WACV 2026] LASER: Lip Landmark Assisted Speaker Detection for Robustness official implemntation☆25Feb 26, 2026Updated last month
- ☆24Feb 20, 2024Updated 2 years ago
- ☆22Jan 17, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"☆18Jun 21, 2023Updated 2 years ago
- ☆17Jul 12, 2020Updated 5 years ago
- TF code for our CVPR2020 paper "Discriminative Multi-modality Speech Recognition"☆26Apr 27, 2022Updated 3 years ago
- [ECCV2024 offical]KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding☆34Jul 12, 2024Updated last year
- The MAVD represents Mandarin Audio-Visual dataset with Depth information. MAVD has a rich variety of modal data, including audio, RGB ima…☆20Apr 22, 2024Updated last year
- [NeurIPS 2024] Mixture of Experts for Audio-Visual Learning☆24Jan 19, 2025Updated last year
- ☆1,012Mar 24, 2025Updated last year