showlab / ShowUI
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
☆986Updated last week
Alternatives and similar repositories for ShowUI:
Users that are interested in ShowUI are comparing it to the libraries listed below
- Out-of-the-box (OOTB) GUI Agent for Windows and macOS☆1,306Updated last week
- An open-sourced end-to-end VLM-based GUI Agent☆753Updated this week
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction☆221Updated last month
- 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.☆490Updated 3 weeks ago
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆279Updated this week
- Agent S: an open agentic framework that uses computers like a human☆808Updated 3 weeks ago
- The model, data and code for the visual GUI Agent SeeClick☆312Updated 2 months ago
- ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)☆397Updated 2 months ago
- This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.☆229Updated this week
- Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.☆588Updated 3 months ago
- Open-sourced, Fast and Context-aware Action Grounding from GUI Instructions for GUI/Computer-use Agents☆320Updated last week
- An LLM-based Web Navigating Agent (KDD'24)☆815Updated 4 months ago
- Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"☆321Updated 3 weeks ago
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆706Updated 2 weeks ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆302Updated last week
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆168Updated this week
- ☆1,326Updated 3 months ago
- Agent driven automation starting with the web. Try it: https://www.emergence.ai/web-automation-api☆1,026Updated 3 weeks ago
- A LLM-based Agent that predict its tasks proactively.☆299Updated last month
- ✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,083Updated last week
- AndroidWorld is an environment and benchmark for autonomous agents☆212Updated last week
- An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through in…☆658Updated 4 months ago
- Scalable RL solution for advanced reasoning of language models☆1,262Updated this week
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆660Updated this week
- Parsing-free RAG supported by VLMs☆590Updated last month
- ☆2,176Updated last week
- Search-o1: Agentic Search-Enhanced Large Reasoning Models☆628Updated last week