EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Visual Reasoning with Qwen2.5-Omni]
☆78May 18, 2025Updated 10 months ago
Alternatives and similar repositories for EchoInk
Users that are interested in EchoInk are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ICASSP2026 HumDial Challenge☆38Dec 13, 2025Updated 4 months ago
- The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…☆26Nov 18, 2025Updated 4 months ago
- [ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?☆44Nov 21, 2025Updated 4 months ago
- Awesome papers for affective computing with llm and mllm☆22Nov 26, 2025Updated 4 months ago
- ☆29Nov 4, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆46Mar 3, 2025Updated last year
- [AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model☆76Apr 7, 2026Updated last week
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated 3 months ago
- [CVPR 2026] VGGDrive: Empowering Vision-Language Models with Cross-View Geometric Grounding for Autonomous Driving☆85Mar 10, 2026Updated last month
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆96Mar 22, 2026Updated 3 weeks ago
- Onset-and-Offset-Aware Sound Event Detection☆22Feb 10, 2025Updated last year
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆60Apr 14, 2025Updated last year
- ☆17May 5, 2024Updated last year
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆52Jul 28, 2025Updated 8 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- ☆20Mar 12, 2025Updated last year
- Implementation and experiment of the MusGConv paper.☆15Sep 6, 2024Updated last year
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆221Oct 12, 2025Updated 6 months ago
- ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025☆13Aug 25, 2025Updated 7 months ago
- Demo for Qwen2.5-VL-3B-Instruct on Axera device.☆15Sep 3, 2025Updated 7 months ago
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆29Dec 22, 2025Updated 3 months ago
- [CVPR 2026] Thinking with Programming Vision: Towards a Unified View for Thinking with Images☆68Jan 23, 2026Updated 2 months ago
- ☆19Sep 1, 2025Updated 7 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Modifications made to Qt for Snipaste.☆11Dec 5, 2024Updated last year
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆23Oct 15, 2024Updated last year
- 树莓派qwen-omni语音助手免TTS/STT☆16Apr 4, 2025Updated last year
- ☆17Mar 26, 2021Updated 5 years ago
- ☆34Sep 15, 2025Updated 7 months ago
- Code for the ICCV 2019 paper "Deep Multi-Model Fusion for Single-Image Dehazing"☆44Oct 10, 2021Updated 4 years ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆119Dec 3, 2025Updated 4 months ago
- Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"☆18Jun 21, 2023Updated 2 years ago
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆202Feb 25, 2026Updated last month
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆75Mar 18, 2025Updated last year
- The official implementation of the paper "Large Scale Knowledge Washing"☆10Jun 12, 2024Updated last year
- Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal T…☆12Mar 9, 2024Updated 2 years ago
- Official code for our paper "Model Composition for Multimodal Large Language Models" (ACL 2024)☆31Jan 8, 2025Updated last year
- Towards Training-free Open-world Segmentation via Image Prompt Foundation Models,☆18Nov 22, 2024Updated last year
- [NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Models☆77Oct 10, 2025Updated 6 months ago
- [CVPR2024] Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model☆12Jul 31, 2024Updated last year