The dataset includes screen summaries that describes Android app screenshot's functionalities. It is used for training and evaluation of the screen2words models (our paper accepted by UIST'21 will be linked soon).
☆67Jul 27, 2021Updated 4 years ago
Alternatives and similar repositories for screen2words
Users that are interested in screen2words are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The dataset includes widget captions that describes UI element's functionalities. It is used for training and evaluation of the widget ca…☆23Jun 24, 2021Updated 4 years ago
- It includes two datasets that are used in the downstream tasks for evaluating UIBert: App Similar Element Retrieval data and Visual Item …☆47Aug 2, 2021Updated 4 years ago
- The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and desc…☆89Mar 7, 2024Updated 2 years ago
- Screen2Vec is a new self-supervised technique for generating more comprehensive semantic embeddings of GUI screens and components using t…☆84Feb 3, 2025Updated last year
- The dataset includes UI object type labels (e.g., BUTTON, IMAGE, CHECKBOX) that describes the semantic type of an UI object on Android ap…☆54Jan 14, 2022Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆32Sep 27, 2024Updated last year
- Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments☆61Aug 19, 2024Updated last year
- A curated mobile app design database☆69Sep 27, 2021Updated 4 years ago
- ☆17May 14, 2024Updated last year
- VINS: Visual Search for Mobile User Interface Design☆51Jan 9, 2021Updated 5 years ago
- [WWW2024 Oral] Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering☆15Apr 22, 2025Updated last year
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"☆29Jul 31, 2024Updated last year
- ☆23Oct 11, 2024Updated last year
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆27Aug 20, 2025Updated 8 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆36May 29, 2025Updated 11 months ago
- ☆132Dec 4, 2023Updated 2 years ago
- ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K …☆146Feb 7, 2025Updated last year
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆156Jan 3, 2026Updated 4 months ago
- This repository holds the data and code for the AndroR2 dataset of manually-reproduced bug reports for Android apps☆25Jun 11, 2021Updated 4 years ago
- Object Detection for Graphical User Interface: Old Fashioned or Deep Learning or a Combination?☆129Feb 20, 2024Updated 2 years ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆19Nov 10, 2023Updated 2 years ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆65Oct 19, 2024Updated last year
- ☆30Apr 16, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- LINEBot☆13Apr 7, 2025Updated last year
- ☆21Jan 30, 2026Updated 3 months ago
- Code for paper "Prompt Engineering a Prompt Engineer" (https://arxiv.org/abs/2311.05661)☆10Aug 1, 2024Updated last year
- Consists of ~500k human annotations on the RICO dataset identifying various icons based on their shapes and semantics, and associations b…☆35Jun 27, 2024Updated last year
- ☆10Feb 8, 2021Updated 5 years ago
- ☆102Dec 22, 2023Updated 2 years ago
- ☆10Dec 21, 2020Updated 5 years ago
- Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"☆383Apr 13, 2026Updated 3 weeks ago
- Under construction☆13Jan 15, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- The computer vision course in SUSTech☆13Jan 5, 2020Updated 6 years ago
- This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…☆14Jul 27, 2025Updated 9 months ago
- "In-Bed Pose Estimation: Deep Learning with Shallow Dataset (JTEHM 2019)☆11Jun 5, 2019Updated 6 years ago
- Unity AR Foundation demos with meshing functionality for iPhone 12 Pro and iPAD Pro LiDAR☆19Nov 11, 2020Updated 5 years ago
- A speed comparison between the GPUs offered by Google Colab vs the MacBook M1 Max 24 Core chip☆10May 25, 2023Updated 2 years ago
- The model, data and code for the visual GUI Agent SeeClick☆478Jul 13, 2025Updated 9 months ago
- Quality Metrics for evaluating the inter-cluster reliability of Multidimensional Projections☆26Apr 30, 2023Updated 3 years ago