EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Visual Reasoning with Qwen2.5-Omni]
☆78May 18, 2025Updated last year
Alternatives and similar repositories for EchoInk
Users that are interested in EchoInk are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning (Awesome & Benchmark)☆105Feb 27, 2026Updated 3 months ago
- Implementation of NAACL 2024 paper Unveiling the Generalization Power of Fine-Tuned Large Language Models☆11Mar 14, 2024Updated 2 years ago
- [ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?☆47Nov 21, 2025Updated 6 months ago
- ☆29Nov 4, 2025Updated 6 months ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆48Mar 3, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Awesome papers for affective computing with llm and mllm☆24Nov 26, 2025Updated 6 months ago
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated 4 months ago
- ☆13Sep 21, 2022Updated 3 years ago
- [CVPR 2026] VGGDrive: Empowering Vision-Language Models with Cross-View Geometric Grounding for Autonomous Driving☆91May 8, 2026Updated 2 weeks ago
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆100Mar 22, 2026Updated 2 months ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆60Apr 14, 2025Updated last year
- ☆17May 5, 2024Updated 2 years ago
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆51Jul 28, 2025Updated 9 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆229Apr 13, 2026Updated last month
- Implementation and experiment of the MusGConv paper.☆17Sep 6, 2024Updated last year
- Demo for Qwen2.5-VL-3B-Instruct on Axera device.☆16Sep 3, 2025Updated 8 months ago
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆59Sep 4, 2024Updated last year
- 安卓版Snipaste☆10Aug 1, 2023Updated 2 years ago
- Modifications made to Qt for Snipaste.☆11Dec 5, 2024Updated last year
- wav2vec2 audio classification for prosodic boundary detection and other tasks☆42Aug 11, 2023Updated 2 years ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆47Sep 19, 2025Updated 8 months ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆23Oct 15, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆17Mar 26, 2021Updated 5 years ago
- ☆34Sep 15, 2025Updated 8 months ago
- Code for the ICCV 2019 paper "Deep Multi-Model Fusion for Single-Image Dehazing"☆44Oct 10, 2021Updated 4 years ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆122Dec 3, 2025Updated 5 months ago
- Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"☆18Jun 21, 2023Updated 2 years ago
- 树莓派qwen-omni语音助手免TTS/STT☆18Apr 4, 2025Updated last year
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆210Feb 25, 2026Updated 3 months ago
- One Discrete Word for Visual Reasoning Overtakes Agentic and Latent Methods☆118May 15, 2026Updated last week
- The official implementation of the paper "Large Scale Knowledge Washing"☆10Jun 12, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal T…☆12Apr 29, 2026Updated 3 weeks ago
- Towards Training-free Open-world Segmentation via Image Prompt Foundation Models,☆18Nov 22, 2024Updated last year
- [CVPR2024] Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model☆12Jul 31, 2024Updated last year
- Data generator for stereo sound event localization and detection task of DCASE 2025 challenge☆16Jul 17, 2025Updated 10 months ago
- ☆37Mar 31, 2026Updated last month
- [NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Models☆80Oct 10, 2025Updated 7 months ago
- ☆14Sep 2, 2024Updated last year