chuyg1005/seeclick-crawler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/chuyg1005/seeclick-crawler)

chuyg1005 / seeclick-crawler

☆20

Alternatives and similar repositories for seeclick-crawler

Users that are interested in seeclick-crawler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

njucckevin / SeeClick
View on GitHub
The model, data and code for the visual GUI Agent SeeClick
☆493Jul 13, 2025Updated last year
aburns4 / textualforesight
View on GitHub
☆12Aug 8, 2024Updated last year
boyugou / llava_uground
View on GitHub
☆18Nov 1, 2024Updated last year
tongshuangwu / llm-crowdsourcing-pipeline
View on GitHub
☆11Jul 6, 2023Updated 3 years ago
ltzheng / Synapse
View on GitHub
[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
☆70Jan 7, 2026Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
cooelf / Auto-GUI
View on GitHub
Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)
☆261Jul 16, 2024Updated 2 years ago
OSU-NLP-Group / UGround
View on GitHub
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆316Mar 11, 2026Updated 4 months ago
ltzheng / agent-studio
View on GitHub
[ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents
☆232Jun 16, 2025Updated last year
X-LANCE / Mobile-Env
View on GitHub
A Universal Platform for Training and Evaluation of Mobile Interaction
☆63Sep 24, 2025Updated 10 months ago
XiaoMi / mobilevlm
View on GitHub
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
☆78Feb 27, 2025Updated last year
3B-Group / ConvRe
View on GitHub
🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)
☆24Oct 10, 2023Updated 2 years ago
X-LANCE / META-GUI-baseline
View on GitHub
[EMNLP 2022] The baseline code for META-GUI dataset
☆16Jul 9, 2024Updated 2 years ago
all-the-noises / eval-arena
View on GitHub
☆34Mar 21, 2026Updated 4 months ago
likaixin2000 / ScreenSpot-Pro-GUI-Grounding
View on GitHub
GUI Grounding for Professional High-Resolution Computer Use
☆383Jun 17, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Liac-li / MM-self-improve-qwen2vl
View on GitHub
☆13Dec 9, 2024Updated last year
MobileAgentBench / mobile-agent-bench
View on GitHub
☆37Sep 30, 2024Updated last year
njucckevin / MM-Self-Improve
View on GitHub
A Self-Training Framework for Vision-Language Reasoning
☆90Jan 23, 2025Updated last year
belindal / LaMPP
View on GitHub
Code for LaMPP: Language Models as Probabilistic Priors for Perception and Action
☆37Apr 3, 2023Updated 3 years ago
OS-Copilot / OS-Atlas
View on GitHub
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
☆452Apr 20, 2025Updated last year
niuzaisheng / ScreenExplorer
View on GitHub
ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World
☆26Jun 17, 2025Updated last year
google-deepmind / pix2act
View on GitHub
☆60Jul 8, 2026Updated 3 weeks ago
google-research-datasets / rico_semantics
View on GitHub
Consists of ~500k human annotations on the RICO dataset identifying various icons based on their shapes and semantics, and associations b…
☆36Jun 27, 2024Updated 2 years ago
HKUNLP / RSA
View on GitHub
Retrieved Sequence Augmentation for Protein Representation Learning
☆52Nov 1, 2023Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
CodeLLM-Research / CodeJudge-Eval
View on GitHub
[COLING25] CodeJudge Eval: Can Large Language Models be Good Judges in Code Understanding?
☆12Dec 3, 2024Updated last year
gridaco / ui-dataset
View on GitHub
A pre labelled dataset for ui element / layout detection
☆67Jun 15, 2023Updated 3 years ago
MuyeHuang / EvoChart
View on GitHub
☆19Nov 3, 2025Updated 8 months ago
zorazrw / trove
View on GitHub
[ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
☆33Sep 20, 2024Updated last year
xlang-ai / aguvis
View on GitHub
[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
☆389Mar 7, 2025Updated last year
zzxslp / MM-Navigator
View on GitHub
GPT-4V in Wonderland: LMMs as Smartphone Agents
☆134Jul 17, 2024Updated 2 years ago
OSU-NLP-Group / Middleware
View on GitHub
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆37Dec 29, 2024Updated last year
Dongping-Chen / GUI-World
View on GitHub
(ICLR 2025) The Official Code Repository for GUI-World.
☆69Dec 18, 2024Updated last year
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
binbin00 / NJU_AdvancedProgramming_SP22
View on GitHub
☆16Feb 20, 2022Updated 4 years ago
keeganhines / snowman
View on GitHub
☆12Jun 24, 2017Updated 9 years ago
web-arena-x / visualwebarena
View on GitHub
VisualWebArena is a benchmark for multimodal agents.
☆484Nov 9, 2024Updated last year
koalazf99 / Awesome-DataCentric-LLM
View on GitHub
Trending projects & awesome papers about data-centric llm studies.
☆40May 20, 2025Updated last year
SAGNIKMJR / ego-AV-spatial-correspondence
View on GitHub
[CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'
☆14Jun 16, 2024Updated 2 years ago
google-research-datasets / screen_annotation
View on GitHub
The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and desc…
☆93Mar 7, 2024Updated 2 years ago
tsinghua-fib-lab / UGI
View on GitHub
Urban Generative Intelligence (UGI): A Foundational Platform for Embodied Agent and Future City
☆12Dec 17, 2023Updated 2 years ago