☆491Sep 25, 2024Updated last year
Alternatives and similar repositories for awesome-large-multimodal-agents
Users that are interested in awesome-large-multimodal-agents are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Latest Advances on Multimodal Large Language Models☆17,889May 1, 2026Updated last month
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆54Feb 27, 2025Updated last year
- Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...☆2,243Apr 30, 2025Updated last year
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆848Feb 3, 2025Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 11 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥☆1,815Updated this week
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆767Feb 1, 2024Updated 2 years ago
- Towards Large Multimodal Models as Visual Foundation Agents☆268Apr 24, 2025Updated last year
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- A fork to add multimodal model training to open-r1☆1,566Feb 8, 2025Updated last year
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"☆522May 24, 2026Updated 3 weeks ago
- Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…☆801May 30, 2026Updated 2 weeks ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆142Mar 1, 2026Updated 3 months ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆1,005May 22, 2026Updated 3 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,421May 11, 2026Updated last month
- Awesome GUI Agent Paper List☆818Jun 5, 2026Updated last week
- The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et a…☆8,151Sep 12, 2025Updated 9 months ago
- List of language agents based on paper "Cognitive Architectures for Language Agents"☆1,234Jan 16, 2025Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆141Oct 10, 2025Updated 8 months ago
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓☆3,637Apr 20, 2026Updated last month
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆430Apr 22, 2025Updated last year
- ☆11Dec 20, 2024Updated last year
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆49Jan 28, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆922Jul 24, 2024Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆154Jun 12, 2026Updated last week
- This is the repository for the Tool Learning survey.☆484Aug 9, 2025Updated 10 months ago
- 🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.