HarryHsing / EchoInk
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [π₯The Exploration of R1 for General Audio-Visual Reasoning with Qwen2.5-Omni]
β22Updated last week
Alternatives and similar repositories for EchoInk
Users that are interested in EchoInk are comparing it to the libraries listed below
Sorting:
- β75Updated 4 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Modelsβ107Updated 3 weeks ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"β24Updated 4 months ago
- β73Updated 11 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentationβ54Updated last week
- Official implement of MIA-DPOβ57Updated 3 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Groundingβ54Updated last month
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language modelβ44Updated 6 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMsβ23Updated 3 weeks ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!β38Updated last month
- β35Updated 10 months ago
- β¨β¨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audioβ46Updated 7 months ago
- A Self-Training Framework for Vision-Language Reasoningβ78Updated 3 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.β74Updated 6 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Modelsβ56Updated 10 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Modelsβ61Updated last week
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiencyβ104Updated 2 weeks ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501β55Updated 9 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relyβ¦β50Updated last year
- This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-viβ¦β104Updated 7 months ago
- CLIP-MoE: Mixture of Experts for CLIPβ34Updated 7 months ago
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding stratβ¦β78Updated 2 months ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)β34Updated 2 months ago
- Official repository of MMDU datasetβ90Updated 7 months ago
- [CVPR2024] ModaVerse: Efficiently Transforming Modalities with LLMsβ29Updated 10 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimizationβ87Updated last year
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"β137Updated 6 months ago
- β91Updated last year
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"β90Updated last week
- β51Updated last year