☆487Sep 25, 2024Updated last year
Alternatives and similar repositories for awesome-large-multimodal-agents
Users that are interested in awesome-large-multimodal-agents are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Latest Advances on Multimodal Large Language Models☆17,505Mar 20, 2026Updated last week
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆52Feb 27, 2025Updated last year
- Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...☆2,212Apr 30, 2025Updated 11 months ago
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆839Feb 3, 2025Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 9 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥☆1,750Feb 12, 2026Updated last month
- Towards Large Multimodal Models as Visual Foundation Agents☆259Apr 24, 2025Updated 11 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆766Feb 1, 2024Updated 2 years ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- A fork to add multimodal model training to open-r1☆1,514Feb 8, 2025Updated last year
- Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…☆749Sep 11, 2025Updated 6 months ago
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"☆499Nov 7, 2025Updated 4 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆138Mar 1, 2026Updated 3 weeks ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆968Nov 14, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Building a comprehensive and handy list of papers for GUI agents☆657Oct 27, 2025Updated 5 months ago
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,389Feb 26, 2026Updated last month
- The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et a…☆8,093Sep 12, 2025Updated 6 months ago
- List of language agents based on paper "Cognitive Architectures for Language Agents"☆1,192Jan 16, 2025Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆141Oct 10, 2025Updated 5 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆416Apr 22, 2025Updated 11 months ago
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓☆3,575May 7, 2025Updated 10 months ago
- ☆12Dec 20, 2024Updated last year
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆49Jan 28, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆919Jul 24, 2024Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆150Jul 1, 2025Updated 8 months ago
- This is the repository for the Tool Learning survey.☆480Aug 9, 2025Updated 7 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,736May 29, 2024Updated last year
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆150Jan 3, 2026Updated 2 months ago
- 🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.☆3,126Mar 19, 2026Updated last week
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆345Sep 20, 2024Updated last year
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆90Feb 6, 2026Updated last month
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges (In IJCAI 2024)☆1,222Nov 21, 2025Updated 4 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- A repo lists papers related to LLM based agent☆2,245Jul 12, 2025Updated 8 months ago
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Mod…☆365Mar 19, 2025Updated last year
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,273Feb 8, 2026Updated last month
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Dec 12, 2024Updated last year
- The model, data and code for the visual GUI Agent SeeClick☆475Jul 13, 2025Updated 8 months ago
- ☆2,892Feb 20, 2025Updated last year
- VisualWebArena is a benchmark for multimodal agents.☆454Nov 9, 2024Updated last year