HarryHsing / EchoInkLinks
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [π₯The Exploration of R1 for General Audio-Visual Reasoning with Qwen2.5-Omni]
β36Updated last month
Alternatives and similar repositories for EchoInk
Users that are interested in EchoInk are comparing it to the libraries listed below
Sorting:
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentationβ69Updated 3 weeks ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language modelβ46Updated 7 months ago
- β80Updated 5 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Groundingβ61Updated 2 weeks ago
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModelsβ36Updated last week
- β91Updated last year
- Official repository of MMDU datasetβ92Updated 8 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Modelsβ66Updated 11 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"β26Updated 6 months ago
- β¨β¨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audioβ46Updated last month
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Modelsβ67Updated last month
- Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language modelsβ38Updated 11 months ago
- Official implement of MIA-DPOβ58Updated 5 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignmentβ56Updated 9 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMsβ25Updated 2 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Modelsβ123Updated 2 months ago
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuningβ49Updated last month
- β32Updated 3 weeks ago
- Official code for paper "GRIT: Teaching MLLMs to Think with Images"β98Updated this week
- β44Updated 5 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiencyβ111Updated last month
- Multimodal RewardBenchβ41Updated 4 months ago
- HallE-Control: Controlling Object Hallucination in LMMsβ31Updated last year
- β86Updated 3 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMsβ47Updated 3 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"β143Updated 7 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)β62Updated 3 weeks ago
- LMM solved catastrophic forgetting, AAAI2025β43Updated 2 months ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)β54Updated 3 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimizationβ88Updated last year