Samsung / ClickAgentLinks

ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents

☆22

Alternatives and similar repositories for ClickAgent

Users that are interested in ClickAgent are comparing it to the libraries listed below

Sorting:

OpenGVLab / GUI-Odyssey
GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from…
☆123Updated this week
YuxiangChai / AMEX-codebase
☆29Updated 10 months ago
IMNearth / CoAT
Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)
☆91Updated 9 months ago
alipay / mobile-agent
☆42Updated last year
RUCBM / GUICourse
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
☆125Updated last year
Dongping-Chen / GUI-World
(ICLR 2025) The Official Code Repository for GUI-World.
☆64Updated 7 months ago
aialt / awesome-mobile-agents
✨✨Latest Papers and Datasets on Mobile and PC GUI Agent
☆131Updated 8 months ago
xbmxb / CoCo-Agent
☆35Updated last year
chuyg1005 / seeclick-crawler
☆19Updated last year
google-research-datasets / screen_annotation
The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and desc…
☆73Updated last year
likaixin2000 / ScreenSpot-Pro-GUI-Grounding
GUI Grounding for Professional High-Resolution Computer Use
☆239Updated this week
OpenGVLab / ZeroGUI
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
☆82Updated 3 weeks ago
XiaoMi / mobilevlm
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
☆70Updated 5 months ago
Yan98 / GTA1
☆83Updated 3 weeks ago
THUDM / Android-Lab
☆228Updated 3 months ago
X-LANCE / Mobile-Env
A Universal Platform for Training and Evaluation of Mobile Interaction
☆51Updated 3 weeks ago
MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆87Updated 9 months ago
AndroidArenaAgent / AndroidArena
☆43Updated last year
google-research-datasets / screen_qa
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K …
☆122Updated 6 months ago
cooelf / Auto-GUI
Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)
☆244Updated last year
OSU-NLP-Group / UGround
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆262Updated 3 weeks ago
ltzheng / agent-studio
[ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents
☆214Updated last month
fairyshine / Seal-Tools
The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…
☆53Updated 9 months ago
PALIN2018 / BrowseComp-ZH
☆88Updated 2 months ago
njucckevin / SeeClick
The model, data and code for the visual GUI Agent SeeClick
☆411Updated 3 weeks ago
FudanNLPLAB / MouSi
☆73Updated last year
ltzheng / Synapse
[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
☆59Updated 6 months ago
ByteDance-Seed / DeepFlow
[ICCV 2025] Deeply Supervised Flow-Based Generative Models
☆24Updated last month
Alpha-VLLM / WeMix-LLM
☆17Updated last year
xlang-ai / OSWorld-G
Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆96Updated last month