yuanpinz/awesome-deep-multimodal-reasoning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuanpinz/awesome-deep-multimodal-reasoning)

yuanpinz / awesome-deep-multimodal-reasoning

Collect the awesome works evolved around reasoning models like O1/R1 in visual domain

☆55

Alternatives and similar repositories for awesome-deep-multimodal-reasoning

Users that are interested in awesome-deep-multimodal-reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,493Mar 9, 2026Updated 4 months ago
real-absolute-AI / NoisyRollout
View on GitHub
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆112Sep 18, 2025Updated 10 months ago
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,435May 11, 2026Updated 2 months ago
MikeWangWZHL / PAPO
View on GitHub
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆151Feb 4, 2026Updated 5 months ago
wwfnb / Laser
View on GitHub
☆16Sep 16, 2025Updated 10 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zht8506 / UniHead
View on GitHub
This is the repository for TNNLS paper: "Unihead: unifying multi-perception for detection heads"
☆15Jan 13, 2025Updated last year
alibaba-damo-academy / VL-Cogito
View on GitHub
☆24Nov 4, 2025Updated 8 months ago
narthchin / DEIQT
View on GitHub
Checkpoints, logs and source code for AAAI-23 paper 'Data-Efficient Image Quality Assessment with Attention-Panel Decoder'
☆39Apr 3, 2024Updated 2 years ago
Thinklab-SJTU / BiLAF
View on GitHub
Official implementation of Our NeurIPS 2024 Paper "Boundary Matters: A Bi-Level Active Finetuning Method"
☆14Feb 11, 2025Updated last year
MinJieDev / Roadmap-Frontend
View on GitHub
☆12Sep 11, 2020Updated 5 years ago
langfengQ / AgentOCR
View on GitHub
[ACL'26 Oral] AgentOCR is a token-efficient framework that compresses multi-turn agent history by rendering it into images and adopting R…
☆39Mar 1, 2026Updated 4 months ago
h-jia / TTE
View on GitHub
☆14Jul 14, 2025Updated last year
renytek13 / Soft-Prompt-Generation
View on GitHub
[ECCV 2024] Soft Prompt Generation for Domain Generalization
☆33Oct 1, 2024Updated last year
Multimodal-Representation-Learning-MRL / GA-DMS
View on GitHub
[EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"
☆25Mar 30, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Geaming2002 / T5-Model-migration
View on GitHub
☆10Oct 27, 2023Updated 2 years ago
CSfufu / Revisual-R1
View on GitHub
[ICLR 2026]🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, mul…
☆212Dec 10, 2025Updated 7 months ago
zjr2000 / REVERIE
View on GitHub
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
☆20Jul 17, 2024Updated 2 years ago
hustzhangyuxin / LLBNAD
View on GitHub
[SMC 2025] Leverage learning bias for noisy anomaly detection.
☆19Oct 24, 2025Updated 8 months ago
ZJU-REAL / BEACON
View on GitHub
[ICML 2026] Milestone-Guided Policy Learning for Long-Horizon Language Agents
☆37May 29, 2026Updated last month
luo-junyu / COUPLE
View on GitHub
☆12Jul 31, 2024Updated last year
MAGIC-AI4Med / RadABench
View on GitHub
The official codes for "Can Modern LLMs Act as Agent Cores in Radiology Environments?"
☆29Jan 22, 2025Updated last year
adobe-research / llava-score
View on GitHub
☆11Oct 2, 2024Updated last year
thunlp / KARL
View on GitHub
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
☆68Apr 5, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zht8506 / UniQA
View on GitHub
This is the repository for paper "UniQA: Unified Vision-Language Pre-training of Quality and Aesthetics"
☆28Mar 12, 2025Updated last year
kayzliu / godm
View on GitHub
Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models
☆15Sep 3, 2025Updated 10 months ago
NOVAglow646 / LLM-MLLM-paper-list
View on GitHub
关于LLM和Multimodal LLM的paper list
☆60Jun 17, 2026Updated last month
GQ93 / Pytorch-geometric-notes
View on GitHub
The notes for pytorch geometric learning
☆10Jul 5, 2020Updated 6 years ago
shiranzada / pure-noise
View on GitHub
Official implementation for "Pure Noise to the Rescue of Insufficient Data: Improving Imbalanced Classification by Training on Random Noi…
☆15Jun 11, 2022Updated 4 years ago
zht8506 / ETDNet
View on GitHub
Code of IEEE TIM Paper: ETDNet: Efficient Transformer-Based Detection Network for Surface Defect Detection
☆28Oct 16, 2023Updated 2 years ago
uulm-mrm / aduulm_360_dataset
View on GitHub
ADUULM-360 dataset access, tools, and baseline models
☆10Sep 11, 2024Updated last year
onion-liu / awesome-image-translation-diffusion
View on GitHub
A collection of awesome resources on image-to-image translation with diffusion models.
☆16Mar 5, 2023Updated 3 years ago
GLJS / audio-datasets
View on GitHub
GitHub Repository for the Survey Paper on Audio-Language Datasets for Scenes and Events
☆17Feb 7, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
fscdc / Awesome-Efficient-Reasoning-Models
View on GitHub
[TMLR 2025] Efficient Reasoning Models: A Survey
☆314Jun 26, 2026Updated 3 weeks ago
Shuyu-XJTU / CMP
View on GitHub
The official code of "Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search"
☆33Updated this week
iOPENCap / awesome-unimodal-training
View on GitHub
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
☆12Oct 15, 2024Updated last year
MICV-yonsei / STORM
View on GitHub
[CVPR 2025] Official Pytorch Code for Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synth…
☆15Jun 21, 2025Updated last year
CityU-AIM-Group / EPT
View on GitHub
Edge-oriented Point cloud Transformer for 3D Intracranial Aneurysm Segmentation. MICCAI22
☆13Aug 18, 2022Updated 3 years ago
YeeZ93 / Awesome-Object-Centric-Learning
View on GitHub
A curated list of researches in object-centric learning
☆11Oct 14, 2024Updated last year
zhaochen0110 / OpenThinkIMG
View on GitHub
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
☆399Jun 1, 2025Updated last year