showlab / ShowUILinks

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

☆1,549

Alternatives and similar repositories for ShowUI

Users that are interested in ShowUI are comparing it to the libraries listed below

Sorting:

zai-org / CogAgent
An open-sourced end-to-end VLM-based GUI Agent
☆1,082Updated 7 months ago
showlab / computer_use_ootb
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
☆1,819Updated 5 months ago
showlab / Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
☆968Updated 2 months ago
microsoft / Magma
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
☆1,844Updated last month
niuzaisheng / ScreenAgent
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
☆525Updated 11 months ago
xlang-ai / aguvis
[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
☆370Updated 8 months ago
microsoft / WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
☆784Updated 6 months ago
ranpox / awesome-computer-use
This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.
☆453Updated last week
njucckevin / SeeClick
The model, data and code for the visual GUI Agent SeeClick
☆436Updated 4 months ago
MinorJerry / WebVoyager
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
☆957Updated last year
likaixin2000 / ScreenSpot-Pro-GUI-Grounding
GUI Grounding for Professional High-Resolution Computer Use
☆277Updated 2 weeks ago
OS-Copilot / OS-Atlas
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
☆397Updated 6 months ago
kyegomez / ScreenAI
Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"
☆369Updated 2 weeks ago
microsoft / GUI-Actor
[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
☆351Updated 2 weeks ago
ByteDance-Seed / Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,486Updated 5 months ago
HumanMLLM / R1-Omni
☆969Updated 7 months ago
google-research / android_world
AndroidWorld is an environment and benchmark for autonomous agents
☆508Updated 2 weeks ago
Westlake-AGI-Lab / AppAgentX
Official implementation of AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
☆551Updated 7 months ago
OSU-NLP-Group / UGround
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆284Updated 3 months ago
OSU-NLP-Group / SeeAct
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…
☆797Updated 9 months ago
THUDM / AutoWebGLM
An LLM-based Web Navigating Agent (KDD'24)
☆895Updated last year
xlang-ai / OpenCUA
OpenCUA: Open Foundations for Computer-Use Agents
☆554Updated last month
xingyaoww / code-act
Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…
☆1,441Updated last year
PKU-YuanGroup / LLaVA-CoT
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
☆2,091Updated last week
OpenBMB / AgentCPM-GUI
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient…
☆1,102Updated 5 months ago
ByteDance-Seed / m3-agent
☆1,082Updated 3 weeks ago
VITA-MLLM / VITA
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
☆2,443Updated 7 months ago
microsoft / SoM
[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs
☆1,476Updated last year
OpenBMB / IoA
An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through in…
☆776Updated last month
Alibaba-NLP / ZeroSearch
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
☆1,184Updated 3 months ago