threegold116 / Awesome-Omni-MLLMsLinks
This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels
β33Updated last week
Alternatives and similar repositories for Awesome-Omni-MLLMs
Users that are interested in Awesome-Omni-MLLMs are comparing it to the libraries listed below
Sorting:
- EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [π₯The Exploration of R1 for General Audio-Viβ¦β31Updated 2 weeks ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.β63Updated 2 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Modelsβ35Updated last month
- The Next Step Forward in Multimodal LLM Alignmentβ161Updated last month
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β172Updated last week
- [CVPR 2025] RAP: Retrieval-Augmented Personalizationβ55Updated 2 weeks ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Modelsβ65Updated last month
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Modelsβ63Updated 10 months ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation inβ¦β124Updated last week
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Reaβ¦β53Updated this week
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuningβ44Updated 2 weeks ago
- The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".β98Updated 6 months ago
- Official repository of MMDU datasetβ91Updated 8 months ago
- β74Updated last year
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigationβ82Updated 5 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMsβ23Updated last month
- β29Updated this week
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) πβ34Updated 2 months ago
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentationβ30Updated 2 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β163Updated last week
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMsβ120Updated 7 months ago
- HallE-Control: Controlling Object Hallucination in LMMsβ31Updated last year
- β84Updated 2 months ago
- Official implement of MIA-DPOβ58Updated 4 months ago
- Visual Instruction Tuning for Qwen2 Base Modelβ34Updated 11 months ago
- β37Updated 10 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoningβ72Updated 2 weeks ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language modelβ46Updated 6 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiencyβ108Updated last month
- β77Updated 4 months ago