☆62Jul 1, 2025Updated 8 months ago
Alternatives and similar repositories for CoGenAV
Users that are interested in CoGenAV are comparing it to the libraries listed below
Sorting:
- Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…☆29Jan 18, 2026Updated last month
- ViSpeR: Multilingual Audio-Visual Speech Recognition☆55Apr 17, 2025Updated 10 months ago
- Audio-Visual Speech Recognition☆20Jul 7, 2025Updated 7 months ago
- (ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation☆15Apr 29, 2025Updated 10 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆199Jul 29, 2025Updated 7 months ago
- [ICME 2025] DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation☆24Mar 25, 2025Updated 11 months ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆47Oct 26, 2024Updated last year
- Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"☆14Feb 24, 2025Updated last year
- The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Paddi…☆10Oct 12, 2023Updated 2 years ago
- An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.☆10May 13, 2020Updated 5 years ago
- [arXiv 2025] ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models☆36Aug 26, 2025Updated 6 months ago
- ☆18Sep 25, 2025Updated 5 months ago
- [Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization☆60Mar 26, 2025Updated 11 months ago
- (SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition☆13Oct 22, 2024Updated last year
- Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…☆46Jun 12, 2025Updated 8 months ago
- Auto-AVSR: Lip-Reading Sentences Project☆404Jan 8, 2025Updated last year
- WildVSR☆21Dec 13, 2023Updated 2 years ago
- [SIGGRAPH Asia 2025] CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling☆44Sep 26, 2025Updated 5 months ago
- JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers☆17Jul 21, 2025Updated 7 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- 🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"☆24Dec 14, 2025Updated 2 months ago
- Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…☆56Jan 18, 2026Updated last month
- [ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents☆34Feb 1, 2026Updated last month
- MegaRAG: Multimodal Graph-based RAG☆36Sep 16, 2025Updated 5 months ago
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆18Jul 16, 2024Updated last year
- A unified robotic manipulation learning framework☆21Sep 4, 2025Updated 5 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆29Oct 9, 2025Updated 4 months ago
- ☆25Jun 18, 2025Updated 8 months ago
- UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions☆46Dec 16, 2025Updated 2 months ago
- Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)☆20Mar 17, 2025Updated 11 months ago
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆31Oct 2, 2025Updated 4 months ago
- [WACV 2026] LASER: Lip Landmark Assisted Speaker Detection for Robustness official implemntation☆22Updated this week
- Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation☆110Updated this week
- ☆37Sep 16, 2024Updated last year
- Visual Speech Recognition for Multiple Languages☆459Aug 17, 2023Updated 2 years ago
- PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…☆20Apr 3, 2024Updated last year
- The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"☆19Jan 25, 2025Updated last year
- ☆17Jul 22, 2024Updated last year