THUDM/VisualAgentBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/THUDM/VisualAgentBench)

THUDM / VisualAgentBench

Towards Large Multimodal Models as Visual Foundation Agents

☆270

Alternatives and similar repositories for VisualAgentBench

Users that are interested in VisualAgentBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

web-arena-x / visualwebarena
View on GitHub
VisualWebArena is a benchmark for multimodal agents.
☆482Nov 9, 2024Updated last year
THUDM / WebRL
View on GitHub
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
☆535Jun 6, 2025Updated last year
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,586Feb 8, 2026Updated 5 months ago
THUDM / Android-Lab
View on GitHub
☆322Aug 18, 2025Updated 11 months ago
IMNearth / CoAT
View on GitHub
Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)
☆103Oct 14, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OSU-NLP-Group / GUI-Agents-Paper-List
View on GitHub
Awesome GUI Agent Paper List
☆861Jun 28, 2026Updated 3 weeks ago
jylee425 / b-moca
View on GitHub
Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation; CoLLAs 2025)
☆34Jul 21, 2025Updated 11 months ago
VisualWebBench / VisualWebBench
View on GitHub
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆68Oct 19, 2024Updated last year
njucckevin / SeeClick
View on GitHub
The model, data and code for the visual GUI Agent SeeClick
☆490Jul 13, 2025Updated last year
OSU-NLP-Group / UGround
View on GitHub
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆314Mar 11, 2026Updated 4 months ago
RUCBM / GUICourse
View on GitHub
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
☆143Mar 1, 2026Updated 4 months ago
Berkeley-NLP / Agent-Eval-Refine
View on GitHub
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆149Nov 26, 2024Updated last year
shulin16 / MMInA
View on GitHub
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆54Feb 27, 2025Updated last year
web-arena-x / webarena
View on GitHub
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
☆1,550Nov 26, 2025Updated 7 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
WeiminXiong / RationaleCL
View on GitHub
Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)
☆12Oct 11, 2023Updated 2 years ago
pkunlp-icler / PCA-EVAL
View on GitHub
[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
☆107Mar 14, 2024Updated 2 years ago
DigiRL-agent / digirl
View on GitHub
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
☆393Feb 22, 2025Updated last year
embodied-agent-interface / embodied-agent-interface
View on GitHub
Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS D&B 2024 Oral)
☆295Mar 6, 2025Updated last year
OSU-NLP-Group / SeeAct
View on GitHub
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…
☆851Feb 3, 2025Updated last year
language-agent-tutorial / language-agent-tutorial.github.io
View on GitHub
[EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks
☆10Nov 27, 2024Updated last year
xlang-ai / OSWorld-G
View on GitHub
[NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆172Jun 18, 2026Updated last month
WooooDyy / AgentGym
View on GitHub
Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…
☆813May 30, 2026Updated last month
kohjingyu / search-agents
View on GitHub
Code for the paper 🌳 Tree Search for Language Model Agents
☆223Jul 25, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
THUDM / AutoWebGLM
View on GitHub
An LLM-based Web Navigating Agent (KDD'24)
☆930Sep 27, 2024Updated last year
OSU-NLP-Group / Mind2Web
View on GitHub
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" -- the first LLM-based web agent and benchmark for generalist w…
☆1,015Nov 5, 2025Updated 8 months ago
Dongping-Chen / GUI-World
View on GitHub
(ICLR 2025) The Official Code Repository for GUI-World.
☆69Dec 18, 2024Updated last year
OpenGVLab / GUI-Odyssey
View on GitHub
[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…
☆159Jan 3, 2026Updated 6 months ago
OSU-NLP-Group / WebDreamer
View on GitHub
[TMLR'25] "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"
☆104Oct 5, 2025Updated 9 months ago
ltzheng / Synapse
View on GitHub
[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
☆69Jan 7, 2026Updated 6 months ago
OSU-NLP-Group / SeeActChromeExtension
View on GitHub
☆18Jan 3, 2025Updated last year
OS-Copilot / OS-Atlas
View on GitHub
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
☆452Apr 20, 2025Updated last year
xlang-ai / OSWorld
View on GitHub
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
☆3,026Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ai-agents-2030 / SPA-Bench
View on GitHub
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
☆64Jul 11, 2025Updated last year
jun0wanan / awesome-large-multimodal-agents
View on GitHub
☆495Sep 25, 2024Updated last year
OSU-NLP-Group / ACuRL
View on GitHub
An Autonomous Curriculum Reinforcement Learning framework that steers agents to continually learn in specific environments with zero huma…
☆38Jun 7, 2026Updated last month
OSU-NLP-Group / Online-Mind2Web
View on GitHub
An Illusion of Progress? Assessing the Current State of Web Agents
☆191Jun 25, 2026Updated 3 weeks ago
ServiceNow / AgentLab
View on GitHub
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…
☆606Updated this week
showlab / Awesome-GUI-Agent
View on GitHub
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
☆1,197Aug 17, 2025Updated 11 months ago
hkust-nlp / AgentBoard
View on GitHub
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆426May 20, 2024Updated 2 years ago