☆491Sep 25, 2024Updated last year
Alternatives and similar repositories for awesome-large-multimodal-agents
Users that are interested in awesome-large-multimodal-agents are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Latest Advances on Multimodal Large Language Models☆17,829May 1, 2026Updated 3 weeks ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆54Feb 27, 2025Updated last year
- Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...☆2,234Apr 30, 2025Updated last year
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆845Feb 3, 2025Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥☆1,798May 11, 2026Updated 2 weeks ago
- Towards Large Multimodal Models as Visual Foundation Agents☆266Apr 24, 2025Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆767Feb 1, 2024Updated 2 years ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- A fork to add multimodal model training to open-r1☆1,548Feb 8, 2025Updated last year
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"☆515Updated this week
- Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…☆791Sep 11, 2025Updated 8 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆141Mar 1, 2026Updated 2 months ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆1,004May 22, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,412May 11, 2026Updated 2 weeks ago
- Awesome GUI Agent Paper List☆785May 12, 2026Updated 2 weeks ago
- The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et a…☆8,134Sep 12, 2025Updated 8 months ago
- List of language agents based on paper "Cognitive Architectures for Language Agents"☆1,222Jan 16, 2025Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆142Oct 10, 2025Updated 7 months ago
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓☆3,622Apr 20, 2026Updated last month
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆424Apr 22, 2025Updated last year
- ☆11Dec 20, 2024Updated last year
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆49Jan 28, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆920Jul 24, 2024Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆149Updated this week
- This is the repository for the Tool Learning survey.☆483Aug 9, 2025Updated 9 months ago
- 🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.☆3,188Mar 28, 2026Updated 2 months ago
- The model, data and code for the visual GUI Agent SeeClick☆482Jul 13, 2025Updated 10 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,742May 29, 2024Updated 2 years ago
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆356Sep 20, 2024Updated last year
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆90Feb 6, 2026Updated 3 months ago
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆158Jan 3, 2026Updated 4 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges (In IJCAI 2024)☆1,265Apr 19, 2026Updated last month
- A repo lists papers related to LLM based agent☆2,302Jul 12, 2025Updated 10 months ago
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Mod…☆372Mar 19, 2025Updated last year
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆54Dec 12, 2024Updated last year
- ☆2,899Feb 20, 2025Updated last year
- VisualWebArena is a benchmark for multimodal agents.☆476Nov 9, 2024Updated last year
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,444Feb 8, 2026Updated 3 months ago