showlab / Awesome-GUI-Agent
π» A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
β336Updated last week
Alternatives and similar repositories for Awesome-GUI-Agent:
Users that are interested in Awesome-GUI-Agent are comparing it to the libraries listed below
- The model, data and code for the visual GUI Agent SeeClickβ248Updated 3 weeks ago
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agentsβ200Updated 3 weeks ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RLβ255Updated this week
- Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.β279Updated 3 weeks ago
- This is a collection of resources for computer-use agents, including videos, blogs, papers, and projects.β135Updated last month
- Environments, tools, and benchmarks for general computer agentsβ184Updated last month
- Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi eβ¦β366Updated 3 months ago
- Official Repo for UGroundβ116Updated last month
- AndroidWorld is an environment and benchmark for autonomous agentsβ151Updated this week
- β174Updated 3 weeks ago
- Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agentβ640Updated this week
- β369Updated 2 months ago
- Towards Large Multimodal Models as Visual Foundation Agentsβ142Updated 3 weeks ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)β206Updated 5 months ago
- ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)β335Updated 3 weeks ago
- AN O1 REPLICATION FOR CODINGβ222Updated last week
- GUICourse: From General Vision Langauge Models to Versatile GUI Agentsβ92Updated 5 months ago
- AWM: Agent Workflow Memoryβ218Updated 3 weeks ago
- ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K β¦β96Updated 4 months ago
- VisualWebArena is a benchmark for multimodal agents.β258Updated last month
- β23Updated 8 months ago
- β998Updated 3 weeks ago
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multβ¦β665Updated last month
- β586Updated 2 weeks ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β366Updated last week
- ποΈ OASIS: Open Agent Social Interaction Simulations with One Million Agents. https://oasis.camel-ai.orgβ192Updated last week
- Large Reasoning Modelsβ718Updated 2 weeks ago
- HPT - Open Multimodal LLMs from HyperGAIβ313Updated 6 months ago
- β289Updated 2 months ago
- β¨β¨Latest Papers and Datasets on Mobile and PC GUI Agentβ73Updated 2 weeks ago