The-Martyr/Awesome-Multimodal-Reasoning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/The-Martyr/Awesome-Multimodal-Reasoning)

The-Martyr / Awesome-Multimodal-Reasoning

Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal LLMs

☆83

Alternatives and similar repositories for Awesome-Multimodal-Reasoning

Users that are interested in Awesome-Multimodal-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,438May 11, 2026Updated 2 months ago
Wang-Xiaodong1899 / Open-R1-Video
View on GitHub
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆382Jul 1, 2026Updated 3 weeks ago
BRZ911 / ViTCoT
View on GitHub
[ACM MM 2025] ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
☆18Jul 15, 2025Updated last year
LJungang / Awesome-Video-Reasoning-Landscape
View on GitHub
🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.
☆190Jun 14, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
The-Martyr / CausalMM
View on GitHub
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
☆67Jul 5, 2025Updated last year
ligeng0197 / Awesome-Thinking-With-Images
View on GitHub
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…
☆113Aug 21, 2025Updated 11 months ago
ant-research / Awesome-Fine-Grained-Multimodal-Perception
View on GitHub
A collection of the latest research and resources on Fine-Grained Multimodal Perception
☆30Jun 4, 2026Updated last month
zyayoung / Awesome-Video-LLMs
View on GitHub
Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.
☆36Jan 20, 2024Updated 2 years ago
ModalMinds / MM-EUREKA
View on GitHub
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆771Sep 7, 2025Updated 10 months ago
www-Ye / Time-R1
View on GitHub
R1-like Video-LLM for Temporal Grounding
☆138Jun 20, 2025Updated last year
OuyangKun10 / Conan
View on GitHub
Multi-step reasoning MLLM
☆25Mar 8, 2026Updated 4 months ago
shengyangsun / TDSD
View on GitHub
Official repository of "TDSD: Text-Driven Scene-Decoupled Weakly Supervised Video Anomaly Detection"
☆11May 25, 2025Updated last year
The-Martyr / Awesome-Modality-Priors-in-MLLMs
View on GitHub
Latest Advances on Modality Priors in Multimodal Large Language Models
☆30Dec 10, 2025Updated 7 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆884Dec 14, 2025Updated 7 months ago
hmyao22 / DADF
View on GitHub
The official implementation of the paper DADF for industrial VAD
☆13Dec 1, 2023Updated 2 years ago
MILVLG / twigvlm
View on GitHub
Implementation of ICCV 2025 paper "Growing a Twig to Accelerate Large Vision-Language Models".
☆30May 23, 2026Updated 2 months ago
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
threegold116 / Awesome-Omni-MLLMs
View on GitHub
This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels
☆103Mar 22, 2026Updated 4 months ago
AI9Stars / AStar-Thought
View on GitHub
[NeurIPS 2025] A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
☆16Jun 12, 2026Updated last month
TideDra / lmm-r1
View on GitHub
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆848May 14, 2025Updated last year
appletea233 / Temporal-R1
View on GitHub
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆62Jun 6, 2025Updated last year
OpenDCAI / Awesome_MLLMs_Reasoning
View on GitHub
☆112Sep 11, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Matt-Su / DR-Adapter
View on GitHub
☆22May 12, 2024Updated 2 years ago
yueshengbin / SMART
View on GitHub
[AAAI 2025 Oral] Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
☆31Apr 14, 2025Updated last year
sanbuphy / computer-vision-reference
View on GitHub
Collected the world's best computer vision labs and lecture materials.
☆15Feb 23, 2025Updated last year
adxcreative / EERCF
View on GitHub
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
☆21Feb 19, 2025Updated last year
Tennine2077 / HiDe
View on GitHub
[ICML 2026] HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling
☆27May 2, 2026Updated 2 months ago
RizwanAliQau / tasad
View on GitHub
☆11Apr 28, 2026Updated 3 months ago
VisionOPD / Vision-OPD
View on GitHub
Vision-OPD is a regional-to-global on-policy self-distillation framework that transfers a model's own privileged crop-conditioned percept…
☆222Jul 17, 2026Updated last week
yaotingwangofficial / Awesome-MCoT
View on GitHub
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆1,016May 22, 2026Updated 2 months ago
emrecanacikgoz / awesome-conversational-agents
View on GitHub
Awesome paper lists for "A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions""
☆34Apr 25, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
dengbaowang / Rethinking-Calibration-of-Deep-Neural-Networks
View on GitHub
☆16Nov 19, 2021Updated 4 years ago
ZJU-REAL / Perceive-to-Reason
View on GitHub
Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning
☆32Jul 8, 2026Updated 3 weeks ago
kevinliang888 / IVR-QA-baselines
View on GitHub
[ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answers
☆20Apr 16, 2024Updated 2 years ago
XIAO4579 / PRISM
View on GitHub
Beyond SFT-to-RL: Pre-alignment via Black-BoxOn-Policy Distillation for Multimodal RL
☆98May 6, 2026Updated 2 months ago
RUCAIBox / Event-Bench
View on GitHub
Official code of *Towards Event-oriented Long Video Understanding*
☆12Jul 26, 2024Updated 2 years ago
cokeshao / Awesome-Multimodal-Token-Compression
View on GitHub
[TMLR 2026] Survey: https://arxiv.org/pdf/2507.20198
☆374Updated this week
hmyao22 / GLCF
View on GitHub
Implementation for paper:"Learning Global-Local Correspondence with Semantic Bottleneck for Logical Anomaly Detection"
☆14Aug 13, 2023Updated 2 years ago