ADaM-BJTU / model-native-agentic-aiLinks
Our survey's paper list on Agentic AI, continuously updated with the latest research.
☆46Updated this week
Alternatives and similar repositories for model-native-agentic-ai
Users that are interested in model-native-agentic-ai are comparing it to the libraries listed below
Sorting:
- ☆91Updated last year
 - Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆76Updated 3 months ago
 - This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆54Updated 7 months ago
 - VeriGUI: Verifiable Long-Chain GUI Dataset☆82Updated last week
 - GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts☆34Updated last month
 - Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.☆34Updated 3 months ago
 - (ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents☆87Updated 9 months ago
 - ☆98Updated 9 months ago
 - [NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆47Updated last month
 - [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆84Updated 4 months ago
 - MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆340Updated 2 months ago
 - Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Updated 2 months ago
 - [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆59Updated 2 months ago
 - [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆228Updated 3 months ago
 - Parameter-Efficient Fine-Tuning for Foundation Models☆96Updated 7 months ago
 - Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆244Updated last month
 - The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.☆58Updated 3 months ago
 - MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆132Updated 2 months ago
 - Repository for the NeurIPS 2024 paper "SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up…☆25Updated 10 months ago
 - SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆85Updated 2 months ago
 - ☆45Updated 10 months ago
 - CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)☆156Updated 2 weeks ago
 - [Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Mul…☆30Updated 3 months ago
 - Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆131Updated 3 months ago
 - [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆88Updated last year
 - The Next Step Forward in Multimodal LLM Alignment☆184Updated 6 months ago
 - ☆116Updated 2 weeks ago
 - ☆60Updated last month
 - Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆126Updated 11 months ago
 - Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆196Updated 5 months ago