web-arena-x/visualwebarena

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/web-arena-x/visualwebarena)

web-arena-x / visualwebarena

VisualWebArena is a benchmark for multimodal agents.

☆470

Alternatives and similar repositories for visualwebarena

Users that are interested in visualwebarena are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

web-arena-x / webarena
View on GitHub
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
☆1,471Nov 26, 2025Updated 5 months ago
OSU-NLP-Group / Mind2Web
View on GitHub
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" -- the first LLM-based web agent and benchmark for generalist w…
☆991Nov 5, 2025Updated 6 months ago
THUDM / VisualAgentBench
View on GitHub
Towards Large Multimodal Models as Visual Foundation Agents
☆265Apr 24, 2025Updated last year
OSU-NLP-Group / SeeAct
View on GitHub
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…
☆846Feb 3, 2025Updated last year
ServiceNow / BrowserGym
View on GitHub
🌎💪 BrowserGym, a Gym environment for web task automation
☆1,224Mar 17, 2026Updated 2 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ServiceNow / WorkArena
View on GitHub
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
☆249Apr 25, 2026Updated 3 weeks ago
ServiceNow / AgentLab
View on GitHub
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…
☆582Mar 17, 2026Updated 2 months ago
shulin16 / MMInA
View on GitHub
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆54Feb 27, 2025Updated last year
kohjingyu / search-agents
View on GitHub
Code for the paper 🌳 Tree Search for Language Model Agents
☆222Jul 25, 2024Updated last year
xlang-ai / OSWorld
View on GitHub
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
☆2,866Updated this week
McGill-NLP / weblinx
View on GitHub
WebLINX is a benchmark for building web navigation agents with conversational capabilities
☆160Feb 11, 2025Updated last year
njucckevin / SeeClick
View on GitHub
The model, data and code for the visual GUI Agent SeeClick
☆480Jul 13, 2025Updated 10 months ago
OSU-NLP-Group / UGround
View on GitHub
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆315Mar 11, 2026Updated 2 months ago
Gabesarch / ICAL
View on GitHub
☆52May 11, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
asappresearch / webagents-step
View on GitHub
☆41Jul 21, 2024Updated last year
THUDM / WebRL
View on GitHub
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
☆524Jun 6, 2025Updated 11 months ago
princeton-nlp / WebShop
View on GitHub
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
☆540Sep 6, 2024Updated last year
RUCBM / GUICourse
View on GitHub
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
☆141Mar 1, 2026Updated 2 months ago
google-research / android_world
View on GitHub
AndroidWorld is an environment and benchmark for autonomous agents
☆767Apr 9, 2026Updated last month
MinorJerry / WebVoyager
View on GitHub
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
☆1,087Mar 4, 2024Updated 2 years ago
ltzheng / Synapse
View on GitHub
[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
☆69Jan 7, 2026Updated 4 months ago
cooelf / Auto-GUI
View on GitHub
Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)
☆260Jul 16, 2024Updated last year
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,444Feb 8, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
JHU-CLSP / turking-bench
View on GitHub
Web-grounded natural language instructions
☆18Nov 25, 2024Updated last year
THUDM / AutoWebGLM
View on GitHub
An LLM-based Web Navigating Agent (KDD'24)
☆932Sep 27, 2024Updated last year
OSU-NLP-Group / GUI-Agents-Paper-List
View on GitHub
Awesome GUI Agent Paper List
☆785May 12, 2026Updated last week
oriyor / assistantbench
View on GitHub
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆70Dec 9, 2024Updated last year
Farama-Foundation / miniwob-plusplus
View on GitHub
MiniWoB++: a web interaction benchmark for reinforcement learning
☆381Apr 6, 2026Updated last month
microsoft / WindowsAgentArena
View on GitHub
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
☆860Apr 13, 2026Updated last month
showlab / videogui
View on GitHub
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
☆52Feb 22, 2026Updated 3 months ago
ddupont808 / GPT-4V-Act
View on GitHub
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
☆1,061Dec 9, 2024Updated last year
DigiRL-agent / digirl
View on GitHub
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
☆395Feb 22, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
chuyg1005 / seeclick-crawler
View on GitHub
☆20Apr 24, 2024Updated 2 years ago
google-deepmind / pix2act
View on GitHub
☆60Jan 9, 2024Updated 2 years ago
zorazrw / trove
View on GitHub
[ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
☆33Sep 20, 2024Updated last year
xlang-ai / aguvis
View on GitHub
[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
☆390Mar 7, 2025Updated last year
hkust-nlp / AgentBoard
View on GitHub
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆415May 20, 2024Updated 2 years ago
Berkeley-NLP / Agent-Eval-Refine
View on GitHub
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆149Nov 26, 2024Updated last year
zorazrw / agent-workflow-memory
View on GitHub
AWM: Agent Workflow Memory
☆430Dec 22, 2025Updated 5 months ago