google-research-datasets / seq2act
This repository contains the opensource version of the datasets were used for different parts of training and testing of models that ground natural language to UI actions as described in the paper: "Mapping Natural Language Instructions to Mobile UI Action Sequences" by Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge, which is acc…
☆32Updated 4 years ago
Alternatives and similar repositories for seq2act:
Users that are interested in seq2act are comparing it to the libraries listed below
- Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments☆60Updated 8 months ago
- A Universal Platform for Training and Evaluation of Mobile Interaction☆44Updated 2 months ago
- Seq2act: Mapping Natural Language Instructions to Mobile UI Action Sequences from Google research☆14Updated 4 years ago
- The dataset includes screen summaries that describes Android app screenshot's functionalities. It is used for training and evaluation of …☆56Updated 3 years ago
- The dataset includes UI object type labels (e.g., BUTTON, IMAGE, CHECKBOX) that describes the semantic type of an UI object on Android ap…☆50Updated 3 years ago
- [EMNLP 2022] The baseline code for META-GUI dataset☆13Updated 9 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆113Updated 9 months ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆109Updated 5 months ago
- Consists of ~500k human annotations on the RICO dataset identifying various icons based on their shapes and semantics, and associations b…☆27Updated 10 months ago
- ☆17Updated last year
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆233Updated 9 months ago
- ☆29Updated 7 months ago
- (ICLR 2025) The Official Code Repository for GUI-World.☆54Updated 4 months ago
- [ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents☆16Updated 6 months ago
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation☆57Updated 8 months ago
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆57Updated last year
- NaturalCodeBench (Findings of ACL 2024)☆64Updated 6 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆56Updated 3 months ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆85Updated 6 months ago
- The dataset includes widget captions that describes UI element's functionalities. It is used for training and evaluation of the widget ca…☆21Updated 3 years ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 6 months ago
- [NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?☆123Updated 8 months ago
- ☆40Updated 9 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆59Updated 7 months ago
- An Illusion of Progress? Assessing the Current State of Web Agents☆40Updated 2 weeks ago
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆81Updated 7 months ago
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"☆26Updated 9 months ago
- ☆34Updated 10 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆138Updated 6 months ago
- Repository for the paper "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners"☆23Updated last week