umbertocappellazzo / Omni-AVSRView external linksLinks
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 2026].
☆28Jan 18, 2026Updated 3 weeks ago
Alternatives and similar repositories for Omni-AVSR
Users that are interested in Omni-AVSR are comparing it to the libraries listed below
Sorting:
- Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…☆56Jan 18, 2026Updated 3 weeks ago
- ☆63Jul 1, 2025Updated 7 months ago
- ☆94Feb 4, 2026Updated last week
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆13Jun 28, 2025Updated 7 months ago
- FIBO-Edit brings the power of structured prompt generation to image editing☆27Jan 29, 2026Updated 2 weeks ago
- ☆89Jan 28, 2026Updated 2 weeks ago
- ☆63Jul 11, 2025Updated 7 months ago
- Code for Learning to Learn Language from Narrated Video☆33Oct 3, 2023Updated 2 years ago
- ☆22Dec 11, 2025Updated 2 months ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 10 months ago
- [ICLR2026] The code for "Interp3D: Correspondence-Aware Interpolation for Generative Textured 3D Morphing."☆21Jan 21, 2026Updated 3 weeks ago
- [CVPR2025] Official code for Lost in Translation Found in Context☆23Jan 14, 2026Updated last month
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆24Jul 21, 2025Updated 6 months ago
- ☆11Oct 31, 2024Updated last year
- KeyPresser Hardware 是一款功能强大的自动化操作工具,通过 Arduino 硬件设备实现键盘模拟、鼠标操作等自动化功能。☆32Jan 21, 2026Updated 3 weeks ago
- ☆16Sep 18, 2025Updated 4 months ago
- Repository for Screen2AX paper☆17Aug 6, 2025Updated 6 months ago
- ☆18Feb 16, 2025Updated last year
- Research code for "Towards multi-task learning of speech and speaker recognition" at https://arxiv.org/pdf/2302.12773.pdf☆12Dec 2, 2024Updated last year
- PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis☆34Oct 27, 2025Updated 3 months ago
- DO with Terraform and Ansible☆11Jun 5, 2018Updated 7 years ago
- Markdown 编辑器, Android Markdown 编辑器, 安卓 Markdown 软件, 手机 Markdown 工具, Ushio MD,Markdown 实时预览, 语法高亮编辑器, 沉浸式写作工具, 移动端码字神器, Markdown 个性化主题, 自动…☆27Feb 1, 2026Updated 2 weeks ago
- A framework for building speech-enabled websites.☆10Jul 10, 2015Updated 10 years ago
- [arXiv 2025] ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models☆35Aug 26, 2025Updated 5 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- ☆12Apr 26, 2025Updated 9 months ago
- Data for evaluating GPT-4V☆11Oct 26, 2023Updated 2 years ago
- Gemma3的comfyui版本☆10Sep 6, 2025Updated 5 months ago
- Turn your prompt salad into sushi! A dev tool to analyze and improve everything your app sends to LLMs☆21Sep 20, 2025Updated 4 months ago
- ☆10Oct 24, 2024Updated last year
- ☆28Sep 5, 2025Updated 5 months ago
- 多Agent驱动的实时广播电台☆30Feb 8, 2026Updated last week
- G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering.☆23Jan 31, 2026Updated 2 weeks ago
- ☆13Oct 25, 2024Updated last year
- An MCP server that runs AI-driven venture capitalist agents (Fred Wilson, Peter Thiel, etc.), whose thinking is continuously enriched by …☆18May 12, 2025Updated 9 months ago
- Remove NotebookLM watermarks from slides. Local processing, no upload needed.☆31Jan 15, 2026Updated last month
- Long Context Research☆26Jan 26, 2026Updated 3 weeks ago
- APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding☆13Jul 22, 2024Updated last year
- I am curating best Black Friday and Cyber Monday deals for developers, mostly learning resource to prepare for coding and system design i…☆30Nov 26, 2025Updated 2 months ago