Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 2026].
☆30Jan 18, 2026Updated last month
Alternatives and similar repositories for Omni-AVSR
Users that are interested in Omni-AVSR are comparing it to the libraries listed below
Sorting:
- Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…☆56Jan 18, 2026Updated last month
- ☆62Jul 1, 2025Updated 8 months ago
- ☆96Feb 4, 2026Updated last month
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- ☆89Jan 28, 2026Updated last month
- ☆63Jul 11, 2025Updated 7 months ago
- FIBO-Edit brings the power of structured prompt generation to image editing☆30Jan 29, 2026Updated last month
- Code for Learning to Learn Language from Narrated Video☆33Oct 3, 2023Updated 2 years ago
- ☆18Feb 16, 2025Updated last year
- ☆11Oct 31, 2024Updated last year
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- Remove NotebookLM watermarks from slides. Local processing, no upload needed.☆37Jan 15, 2026Updated last month
- [CVPR2025] Official code for Lost in Translation Found in Context☆23Jan 14, 2026Updated last month
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆25Jul 21, 2025Updated 7 months ago
- ☆23Dec 11, 2025Updated 2 months ago
- ☆16Sep 18, 2025Updated 5 months ago
- [ICLR2026] The code for "Interp3D: Correspondence-Aware Interpolation for Generative Textured 3D Morphing."☆25Jan 21, 2026Updated last month
- Data for evaluating GPT-4V☆11Oct 26, 2023Updated 2 years ago
- Long Context Research☆29Jan 26, 2026Updated last month
- ☆21Jun 16, 2025Updated 8 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- Turn your prompt salad into sushi! A dev tool to analyze and improve everything your app sends to LLMs☆21Sep 20, 2025Updated 5 months ago
- Gemma3的comfyui版本☆10Sep 6, 2025Updated 6 months ago
- Repository for Screen2AX paper☆19Aug 6, 2025Updated 7 months ago
- ☆10Oct 24, 2024Updated last year
- An MCP server that runs AI-driven venture capitalist agents (Fred Wilson, Peter Thiel, etc.), whose thinking is continuously enriched by …☆20May 12, 2025Updated 9 months ago
- I am curating best Black Friday and Cyber Monday deals for developers, mostly learning resource to prepare for coding and system design i…☆29Nov 26, 2025Updated 3 months ago
- ☆13Oct 25, 2024Updated last year
- ComfyUI custom node implementation of VideoMaMa for video matting with mask conditioning.☆40Feb 9, 2026Updated last month
- PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis☆33Oct 27, 2025Updated 4 months ago
- ☆12Apr 26, 2025Updated 10 months ago
- A framework for building speech-enabled websites.☆10Jul 10, 2015Updated 10 years ago
- APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding☆14Jul 22, 2024Updated last year
- G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering.☆23Jan 31, 2026Updated last month
- KeyPresser Hardware 是一款功能强大的自动化操作工具,通过 Arduino 硬件设备实现键盘模拟、鼠标操作等自动化功能。☆33Jan 21, 2026Updated last month
- DO with Terraform and Ansible☆11Jun 5, 2018Updated 7 years ago
- ☆27Sep 5, 2025Updated 6 months ago
- [ICLR 2026] RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation☆30Feb 5, 2026Updated last month
- Research code for "Towards multi-task learning of speech and speaker recognition" at https://arxiv.org/pdf/2302.12773.pdf☆12Dec 2, 2024Updated last year