The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, location, OCR text and a short description. It has been introduced in the paper `ScreenAI: A Vision-Language Model for UI and Infographics Understanding`.
☆91Mar 7, 2024Updated 2 years ago
Alternatives and similar repositories for screen_annotation
Users that are interested in screen_annotation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K …☆150Feb 7, 2025Updated last year
- The dataset includes screen summaries that describes Android app screenshot's functionalities. It is used for training and evaluation of …☆67Jul 27, 2021Updated 4 years ago
- Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"☆383May 11, 2026Updated last month
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆158Jan 3, 2026Updated 5 months ago
- The model, data and code for the visual GUI Agent SeeClick☆483Jul 13, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆34Oct 1, 2024Updated last year
- Consists of ~500k human annotations on the RICO dataset identifying various icons based on their shapes and semantics, and associations b…☆36Jun 27, 2024Updated last year
- Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments☆61Aug 19, 2024Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆68Oct 19, 2024Updated last year
- ☆132Dec 4, 2023Updated 2 years ago
- Unoffical Pytorch Implementation of Improving Inference for Neural Image Compression☆15Apr 27, 2025Updated last year
- ☆20Apr 24, 2024Updated 2 years ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆103Oct 14, 2024Updated last year
- A pre labelled dataset for ui element / layout detection☆67Jun 15, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Official code accompanying the arXiv paper Compressing Multisets with Large Alphabets☆30Sep 22, 2021Updated 4 years ago
- [NeurIPS 2025] UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents☆58Nov 27, 2025Updated 6 months ago
- Official Implementation of CL-ALFRED (ICLR'24)☆32Oct 24, 2024Updated last year
- Custom object detection for UI of the design system using TensorFlow☆17Jun 20, 2023Updated 2 years ago
- Code and data for paper named: Large language models for automatic equation discovery of nonlinear dynamics☆13Mar 6, 2025Updated last year
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"☆39Nov 11, 2025Updated 7 months ago
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆317Mar 11, 2026Updated 3 months ago
- Code repository for the paper "Learning partial differential equations for biological transport models from noisy spatiotemporal data"☆10Jul 3, 2019Updated 6 years ago
- [EMNLP 2022] The baseline code for META-GUI dataset☆15Jul 9, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for the paper "Closing the Curious Case of Neural Text Degeneration"☆12Apr 9, 2025Updated last year
- ☆36Nov 22, 2022Updated 3 years ago
- Object Detection for Graphical User Interface: Old Fashioned or Deep Learning or a Combination?☆128Feb 20, 2024Updated 2 years ago
- Official Project Webpage for paper "DiffSRL: Learning Dynamic-aware State Representation for Control via Differentiable Simulation"☆12Apr 4, 2022Updated 4 years ago
- Graph Convolutional Module for Temporal Action Localization in Videos☆10Jul 4, 2020Updated 5 years ago
- Dataset for EMNLP'23 Paper "DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading"☆11Oct 25, 2023Updated 2 years ago
- A recipe for constituency parsing, disfluency tagging and obtaining the fluent transcripts of English Fisher dataset☆13May 2, 2021Updated 5 years ago
- MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding☆79Feb 27, 2025Updated last year
- Graph Key Information Extraction: GKIE☆11Sep 15, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆20Jan 22, 2026Updated 4 months ago
- [NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" -- the first LLM-based web agent and benchmark for generalist w…☆1,000Nov 5, 2025Updated 7 months ago
- Genius Rank - 对对队第三届七牛云 1024 创作节☆18Dec 7, 2024Updated last year
- Submission for MICCAI HACKATHON: https://miccai-hackathon.com/#participate☆15Jul 19, 2023Updated 2 years ago
- [ICCV 2025] LIRA☆22Nov 25, 2025Updated 6 months ago
- Conversational Recommender System Evaluation via Simulation☆19Jun 2, 2026Updated last week
- GUI Grounding for Professional High-Resolution Computer Use☆374Apr 14, 2026Updated last month