☆490Sep 25, 2024Updated last year
Alternatives and similar repositories for awesome-large-multimodal-agents
Users that are interested in awesome-large-multimodal-agents are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Latest Advances on Multimodal Large Language Models☆17,736May 1, 2026Updated last week
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆53Feb 27, 2025Updated last year
- Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...☆2,227Apr 30, 2025Updated last year
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆846Feb 3, 2025Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 10 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥☆1,780Apr 20, 2026Updated 2 weeks ago
- Towards Large Multimodal Models as Visual Foundation Agents☆265Apr 24, 2025Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆766Feb 1, 2024Updated 2 years ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- A fork to add multimodal model training to open-r1☆1,536Feb 8, 2025Updated last year
- Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…☆773Sep 11, 2025Updated 7 months ago
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"☆510Nov 7, 2025Updated 6 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆142Mar 1, 2026Updated 2 months ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆989Nov 14, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,408Apr 19, 2026Updated 2 weeks ago
- Awesome GUI Agent Paper List☆754Updated this week
- The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et a…☆8,113Sep 12, 2025Updated 7 months ago
- List of language agents based on paper "Cognitive Architectures for Language Agents"☆1,206Jan 16, 2025Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆141Oct 10, 2025Updated 6 months ago
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓☆3,605Apr 20, 2026Updated 2 weeks ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆423Apr 22, 2025Updated last year
- ☆11Dec 20, 2024Updated last year
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆49Jan 28, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆920Jul 24, 2024Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆149Jul 1, 2025Updated 10 months ago
- This is the repository for the Tool Learning survey.☆483Aug 9, 2025Updated 9 months ago
- The model, data and code for the visual GUI Agent SeeClick☆478Jul 13, 2025Updated 9 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,737May 29, 2024Updated last year
- 🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.☆3,164Mar 28, 2026Updated last month
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆354Sep 20, 2024Updated last year
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆90Feb 6, 2026Updated 3 months ago
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆156Jan 3, 2026Updated 4 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges (In IJCAI 2024)☆1,245Apr 19, 2026Updated 2 weeks ago
- A repo lists papers related to LLM based agent☆2,290Jul 12, 2025Updated 9 months ago
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Mod…☆370Mar 19, 2025Updated last year
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Dec 12, 2024Updated last year
- ☆2,899Feb 20, 2025Updated last year
- VisualWebArena is a benchmark for multimodal agents.☆466Nov 9, 2024Updated last year
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,399Feb 8, 2026Updated 3 months ago