☆492Sep 25, 2024Updated last year
Alternatives and similar repositories for awesome-large-multimodal-agents
Users that are interested in awesome-large-multimodal-agents are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Latest Advances on Multimodal Large Language Models☆17,624Apr 9, 2026Updated last week
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆53Feb 27, 2025Updated last year
- Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...☆2,220Apr 30, 2025Updated 11 months ago
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆844Feb 3, 2025Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥☆1,769Feb 12, 2026Updated 2 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆263Apr 24, 2025Updated 11 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆766Feb 1, 2024Updated 2 years ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- A fork to add multimodal model training to open-r1☆1,528Feb 8, 2025Updated last year
- The official GitHub page for the survey paper "A Survey on Data Augmentation in Large Model Era"☆134Jul 10, 2024Updated last year
- Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…☆766Sep 11, 2025Updated 7 months ago
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"☆508Nov 7, 2025Updated 5 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆141Mar 1, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆977Nov 14, 2025Updated 5 months ago
- Building a comprehensive and handy list of papers for GUI agents☆719Updated this week
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,401Feb 26, 2026Updated last month
- The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et a…☆8,105Sep 12, 2025Updated 7 months ago
- List of language agents based on paper "Cognitive Architectures for Language Agents"☆1,196Jan 16, 2025Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆142Oct 10, 2025Updated 6 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆421Apr 22, 2025Updated 11 months ago
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓☆3,591May 7, 2025Updated 11 months ago
- ☆12Dec 20, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆49Jan 28, 2024Updated 2 years ago
- ☆919Jul 24, 2024Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆150Jul 1, 2025Updated 9 months ago
- This is the repository for the Tool Learning survey.☆481Aug 9, 2025Updated 8 months ago
- The model, data and code for the visual GUI Agent SeeClick☆478Jul 13, 2025Updated 9 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,736May 29, 2024Updated last year
- 🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.☆3,151Mar 28, 2026Updated 3 weeks ago
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆347Sep 20, 2024Updated last year
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆90Feb 6, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges (In IJCAI 2024)☆1,234Nov 21, 2025Updated 4 months ago
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆153Jan 3, 2026Updated 3 months ago
- A repo lists papers related to LLM based agent☆2,268Jul 12, 2025Updated 9 months ago
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Mod…☆365Mar 19, 2025Updated last year
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Dec 12, 2024Updated last year
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,334Feb 8, 2026Updated 2 months ago
- VisualWebArena is a benchmark for multimodal agents.☆456Nov 9, 2024Updated last year