RUCBM/GUICourse

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RUCBM/GUICourse)

RUCBM / GUICourse

GUICourse: From General Vision Langauge Models to Versatile GUI Agents

☆136

Alternatives and similar repositories for GUICourse

Users that are interested in GUICourse are comparing it to the libraries listed below

Sorting:

OpenGVLab / GUI-Odyssey
View on GitHub
[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…
☆147Jan 3, 2026Updated 2 months ago
njucckevin / SeeClick
View on GitHub
The model, data and code for the visual GUI Agent SeeClick
☆469Jul 13, 2025Updated 7 months ago
Dongping-Chen / GUI-World
View on GitHub
(ICLR 2025) The Official Code Repository for GUI-World.
☆68Dec 18, 2024Updated last year
IMNearth / CoAT
View on GitHub
Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)
☆99Oct 14, 2024Updated last year
chuyg1005 / seeclick-crawler
View on GitHub
☆20Apr 24, 2024Updated last year
YuxiangChai / AMEX-codebase
View on GitHub
☆31Sep 27, 2024Updated last year
OSU-NLP-Group / UGround
View on GitHub
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆300Jul 18, 2025Updated 7 months ago
cooelf / Auto-GUI
View on GitHub
Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)
☆255Jul 16, 2024Updated last year
JHU-CLSP / turking-bench
View on GitHub
Web-grounded natural language instructions
☆18Nov 25, 2024Updated last year
921112343 / GUI-Xplore
View on GitHub
[CVPR 2025] GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
☆20Mar 21, 2025Updated 11 months ago
VisualWebBench / VisualWebBench
View on GitHub
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆64Oct 19, 2024Updated last year
DigiRL-agent / digirl
View on GitHub
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
☆389Feb 22, 2025Updated last year
HazyResearch / wonderbread
View on GitHub
WONDERBREAD benchmark + dataset for BPM tasks
☆34Jul 30, 2025Updated 7 months ago
showlab / Awesome-GUI-Agent
View on GitHub
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
☆1,125Aug 17, 2025Updated 6 months ago
shulin16 / MMInA
View on GitHub
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆48Feb 27, 2025Updated last year
jylee425 / b-moca
View on GitHub
Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation; CoLLAs 2025)
☆35Jul 21, 2025Updated 7 months ago
OS-Copilot / OS-Genesis
View on GitHub
[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
☆180Oct 8, 2025Updated 4 months ago
xbmxb / CoCo-Agent
View on GitHub
☆35Jun 20, 2024Updated last year
google-research / android_world
View on GitHub
AndroidWorld is an environment and benchmark for autonomous agents
☆640Feb 24, 2026Updated last week
google-research-datasets / screen_qa
View on GitHub
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K …
☆139Feb 7, 2025Updated last year
Samsung / ClickAgent
View on GitHub
ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents
☆28Oct 28, 2024Updated last year
OS-Copilot / OS-Atlas
View on GitHub
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
☆437Apr 20, 2025Updated 10 months ago
web-arena-x / visualwebarena
View on GitHub
VisualWebArena is a benchmark for multimodal agents.
☆440Nov 9, 2024Updated last year
THUDM / Android-Lab
View on GitHub
☆301Aug 18, 2025Updated 6 months ago
xlang-ai / aguvis
View on GitHub
[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
☆381Mar 7, 2025Updated 11 months ago
OSU-NLP-Group / GUI-Agents-Paper-List
View on GitHub
Building a comprehensive and handy list of papers for GUI agents
☆641Oct 27, 2025Updated 4 months ago
ltzheng / agent-studio
View on GitHub
[ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents
☆228Jun 16, 2025Updated 8 months ago
ai-agents-2030 / SPA-Bench
View on GitHub
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
☆60Jul 11, 2025Updated 7 months ago
ltzheng / Synapse
View on GitHub
[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
☆68Jan 7, 2026Updated last month
THUDM / VisualAgentBench
View on GitHub
Towards Large Multimodal Models as Visual Foundation Agents
☆256Apr 24, 2025Updated 10 months ago
asappresearch / webagents-step
View on GitHub
☆41Jul 21, 2024Updated last year
google-research-datasets / screen_annotation
View on GitHub
The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and desc…
☆84Mar 7, 2024Updated last year
zzxslp / MM-Navigator
View on GitHub
GPT-4V in Wonderland: LMMs as Smartphone Agents
☆135Jul 17, 2024Updated last year
alipay / mobile-agent
View on GitHub
☆44Mar 19, 2024Updated last year
niuzaisheng / ScreenAgent
View on GitHub
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
☆572Nov 25, 2024Updated last year
njucckevin / MM-Self-Improve
View on GitHub
A Self-Training Framework for Vision-Language Reasoning
☆88Jan 23, 2025Updated last year
likaixin2000 / ScreenSpot-Pro-GUI-Grounding
View on GitHub
GUI Grounding for Professional High-Resolution Computer Use
☆333Feb 20, 2026Updated last week
McGill-NLP / weblinx
View on GitHub
WebLINX is a benchmark for building web navigation agents with conversational capabilities
☆160Feb 11, 2025Updated last year
THUDM / AutoWebGLM
View on GitHub
An LLM-based Web Navigating Agent (KDD'24)
☆929Sep 27, 2024Updated last year