HJYao00/Awesome-Agentic-MLLMs

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HJYao00/Awesome-Agentic-MLLMs)

HJYao00 / Awesome-Agentic-MLLMs

Agentic MLLMs

☆216

Alternatives and similar repositories for Awesome-Agentic-MLLMs

Users that are interested in Awesome-Agentic-MLLMs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HJYao00 / MM-DeepResearch
View on GitHub
MLLM, DeepResearch, Agentic AI
☆18Jun 1, 2026Updated last month
HJYao00 / R1-ShareVL
View on GitHub
[NeurIPS 2025] Reasoning MLLM, Share-GRPO, advantage vanishing, sparse reward
☆38Sep 19, 2025Updated 10 months ago
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,435May 11, 2026Updated 2 months ago
Visual-Agent / DeepEyes
View on GitHub
☆1,251Nov 20, 2025Updated 8 months ago
ChoS3nE11ven / Agentic-MME
View on GitHub
☆36Apr 13, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,493Mar 9, 2026Updated 4 months ago
HJYao00 / MMReason
View on GitHub
[ICCV 2025] MMReason, MLLMs, step by step, reasoning benchmark, AGI
☆15Apr 25, 2026Updated 3 months ago
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,302Updated this week
DeepExperience / HyperEyes
View on GitHub
HyperEyes is a parallel multimodal search agent that fuses visual grounding and retrieval into a single atomic action, enabling concurren…
☆70May 23, 2026Updated 2 months ago
LunarShen / DsicoVLA
View on GitHub
[CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
☆22Jun 23, 2025Updated last year
jxhuang0508 / Awesome-LLM-Reasoning-OpenAI-o1
View on GitHub
Awesome LLM papers, news and projects about learning to reason with LLM, OpenAI o1, reasonning techniques, chain-of-thought (COT), Large …
☆28Oct 10, 2024Updated last year
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
Gen-Verse / Open-AgentRL
View on GitHub
RLAnything (ICML 2026) & AutoTool (ICML 2026), DemyAgent: Open-Source RL for LLMs and Agentic Scenarios
☆592Jun 12, 2026Updated last month
takomc / amp
View on GitHub
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆22Sep 26, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
bronyayang / Law_of_Vision_Representation_in_MLLMs
View on GitHub
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆177Oct 6, 2025Updated 9 months ago
yaotingwangofficial / Awesome-MCoT
View on GitHub
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆1,017May 22, 2026Updated 2 months ago
HITsz-TMG / Awesome-Large-Multimodal-Reasoning-Models
View on GitHub
The development and future prospects of large multimodal reasoning models.
☆614Jan 9, 2026Updated 6 months ago
zzxslp / SoM-LLaVA
View on GitHub
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
☆145Aug 23, 2024Updated last year
threegold116 / Awesome-Omni-MLLMs
View on GitHub
This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels
☆103Mar 22, 2026Updated 4 months ago
weijiawu / Awesome-RL-for-Multimodal-Foundation-Models
View on GitHub
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
☆451Apr 28, 2026Updated 2 months ago
HHHHHejia / Awesome-AgenticLLM-RL-Papers
View on GitHub
☆1,846Jun 18, 2026Updated last month
rootyJeon / Vision-aligned-Latent-Reasoning
View on GitHub
[ICML 2026] Official implementation of Vision-aligned Latent Reasoning for Multi-modal Large Language Model (VaLR)
☆20Apr 30, 2026Updated 2 months ago
TsinghuaC3I / Awesome-RL-for-LRMs
View on GitHub
A Survey of Reinforcement Learning for Large Reasoning Models
☆2,469Nov 9, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
swordlidev / Evaluation-Multimodal-LLMs-Survey
View on GitHub
A Survey on Benchmarks of Multimodal Large Language Models
☆156Jul 13, 2026Updated last week
X-PLUG / ToolCUA
View on GitHub
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
☆58May 13, 2026Updated 2 months ago
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated 2 years ago
Visual-Agent / DeepEyesV2
View on GitHub
☆624Feb 26, 2026Updated 5 months ago
XiaoYee / Awesome_Efficient_LRM_Reasoning
View on GitHub
😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyond
☆355Jan 22, 2026Updated 6 months ago
DreamMr / RAP
View on GitHub
Code for Retrieval-Augmented Perception （ICML 2025)
☆74Apr 22, 2026Updated 3 months ago
inclusionAI / M2-Reasoning
View on GitHub
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
☆47Jul 17, 2025Updated last year
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
Purshow / Awesome-Unified-Multimodal
View on GitHub
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
☆365Jan 8, 2026Updated 6 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Haochen-Wang409 / TreeVGR
View on GitHub
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
☆92Jan 26, 2026Updated 5 months ago
HJYao00 / Side4Video
View on GitHub
☆42Apr 7, 2024Updated 2 years ago
yfzhang114 / Thyme
View on GitHub
✨✨ [ICLR 2026] Think Beyond Images
☆583Sep 23, 2025Updated 10 months ago
TIGER-AI-Lab / VisualWebInstruct
View on GitHub
The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]
☆39Feb 1, 2026Updated 5 months ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆423Jan 29, 2026Updated 5 months ago
EvolvingLMMs-Lab / OpenMMReasoner
View on GitHub
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
☆164Mar 30, 2026Updated 3 months ago
EvolvingLMMs-Lab / LongVT
View on GitHub
[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
☆255Jun 24, 2026Updated last month