jun0wanan / awesome-large-multimodal-agentsView external linksLinks
☆483Sep 25, 2024Updated last year
Alternatives and similar repositories for awesome-large-multimodal-agents
Users that are interested in awesome-large-multimodal-agents are comparing it to the libraries listed below
Sorting:
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Feb 27, 2025Updated 11 months ago
- Latest Advances on Multimodal Large Language Models☆17,337Feb 7, 2026Updated last week
- Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...☆2,201Apr 30, 2025Updated 9 months ago
- A fork to add multimodal model training to open-r1☆1,474Feb 8, 2025Updated last year
- This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥☆1,702Updated this week
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆763Feb 1, 2024Updated 2 years ago
- The official GitHub page for the survey paper "A Survey on Data Augmentation in Large Model Era"☆132Jul 10, 2024Updated last year
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆822Feb 3, 2025Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 7 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆256Apr 24, 2025Updated 9 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆136Jul 17, 2024Updated last year
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆412Apr 22, 2025Updated 9 months ago
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"☆471Nov 7, 2025Updated 3 months ago
- A Survey on Benchmarks of Multimodal Large Language Models☆148Jul 1, 2025Updated 7 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆138Oct 10, 2025Updated 4 months ago
- Building a comprehensive and handy list of papers for GUI agents☆633Oct 27, 2025Updated 3 months ago
- Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…☆723Sep 11, 2025Updated 5 months ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆957Nov 14, 2025Updated 3 months ago
- Paper list for Personal LLM Agents☆424May 8, 2024Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆340Sep 20, 2024Updated last year
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆150Nov 29, 2024Updated last year
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Mod…☆361Mar 19, 2025Updated 10 months ago
- The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et a…☆8,057Sep 12, 2025Updated 5 months ago
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆147Jan 3, 2026Updated last month
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,350Dec 7, 2025Updated 2 months ago
- List of language agents based on paper "Cognitive Architectures for Language Agents"☆1,163Jan 16, 2025Updated last year
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,724May 29, 2024Updated last year
- ☆2,883Feb 20, 2025Updated 11 months ago
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓☆3,538May 7, 2025Updated 9 months ago
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆49Jan 28, 2024Updated 2 years ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆40May 26, 2025Updated 8 months ago
- This is the repository for the Tool Learning survey.☆478Aug 9, 2025Updated 6 months ago
- Witness the aha moment of VLM with less than $3.☆4,032May 19, 2025Updated 8 months ago
- Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey☆474Dec 10, 2024Updated last year
- A repo lists papers related to LLM based agent☆2,221Jul 12, 2025Updated 7 months ago
- ☆917Jul 24, 2024Updated last year
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,162Feb 8, 2026Updated last week
- Must-read Papers on LLM Agents.☆2,883Jan 15, 2026Updated last month