xlang-ai/aguvis

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xlang-ai/aguvis)

xlang-ai / aguvis

[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

☆389

Alternatives and similar repositories for aguvis

Users that are interested in aguvis are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OSU-NLP-Group / UGround
View on GitHub
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆316Mar 11, 2026Updated 4 months ago
OS-Copilot / OS-Atlas
View on GitHub
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
☆452Apr 20, 2025Updated last year
njucckevin / SeeClick
View on GitHub
The model, data and code for the visual GUI Agent SeeClick
☆493Jul 13, 2025Updated last year
xlang-ai / AgentTrek
View on GitHub
[ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
☆60Feb 21, 2025Updated last year
xlang-ai / OSWorld-G
View on GitHub
[NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆172Jun 18, 2026Updated last month
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
AriaUI / Aria-UI
View on GitHub
Open-sourced, Fast and Context-aware Action Grounding from GUI Instructions for GUI/Computer-use Agents
☆407Feb 8, 2025Updated last year
likaixin2000 / ScreenSpot-Pro-GUI-Grounding
View on GitHub
GUI Grounding for Professional High-Resolution Computer Use
☆383Jun 17, 2026Updated last month
xlang-ai / OSWorld
View on GitHub
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
☆3,045Updated this week
Yan98 / GTA1
View on GitHub
☆130Oct 3, 2025Updated 9 months ago
microsoft / GUI-Actor
View on GitHub
[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
☆410Apr 13, 2026Updated 3 months ago
OS-Copilot / OS-Genesis
View on GitHub
[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
☆189Oct 8, 2025Updated 9 months ago
ranpox / awesome-computer-use
View on GitHub
This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.
☆572Apr 15, 2026Updated 3 months ago
showlab / Awesome-GUI-Agent
View on GitHub
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
☆1,201Aug 17, 2025Updated 11 months ago
showlab / macosworld
View on GitHub
☆35Jan 28, 2026Updated 6 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
IMNearth / CoAT
View on GitHub
Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)
☆103Oct 14, 2024Updated last year
RUCBM / GUICourse
View on GitHub
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
☆143Mar 1, 2026Updated 4 months ago
showlab / ShowUI
View on GitHub
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
☆1,887Apr 24, 2026Updated 3 months ago
THUDM / Android-Lab
View on GitHub
☆324Aug 18, 2025Updated 11 months ago
lll6gg / UI-R1
View on GitHub
[AAAI-2026] Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
☆158Nov 24, 2025Updated 8 months ago
zai-org / CogAgent
View on GitHub
An open-sourced end-to-end VLM-based GUI Agent
☆1,190Apr 4, 2025Updated last year
open-compass / MMBench-GUI
View on GitHub
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…
☆113Sep 8, 2025Updated 10 months ago
THUDM / WebRL
View on GitHub
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
☆535Jun 6, 2025Updated last year
OSU-NLP-Group / Mind2Web
View on GitHub
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" -- the first LLM-based web agent and benchmark for generalist w…
☆1,016Nov 5, 2025Updated 8 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
GAIR-NLP / PC-Agent
View on GitHub
PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World
☆313May 21, 2025Updated last year
InfiXAI / InfiGUI-R1
View on GitHub
Repository for the paper "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners"
☆67Dec 4, 2025Updated 7 months ago
Dongping-Chen / GUI-World
View on GitHub
(ICLR 2025) The Official Code Repository for GUI-World.
☆69Dec 18, 2024Updated last year
google-research / android_world
View on GitHub
AndroidWorld is an environment and benchmark for autonomous agents
☆835Jul 16, 2026Updated last week
xlang-ai / Spider2-V
View on GitHub
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
☆153Aug 26, 2024Updated last year
microsoft / WindowsAgentArena
View on GitHub
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
☆883Apr 13, 2026Updated 3 months ago
chuyg1005 / seeclick-crawler
View on GitHub
☆20Apr 24, 2024Updated 2 years ago
YXB-NKU / SE-GUI
View on GitHub
[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
☆108Oct 21, 2025Updated 9 months ago
OSU-NLP-Group / GUI-Agents-Paper-List
View on GitHub
Awesome GUI Agent Paper List
☆866Jun 28, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ritzz-ai / GUI-R1
View on GitHub
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
☆252May 5, 2025Updated last year
DigiRL-agent / digirl
View on GitHub
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
☆393Feb 22, 2025Updated last year
xlang-ai / OpenCUA
View on GitHub
[NeurIPS 2025 Spotlight] OpenCUA: Open Foundations for Computer-Use Agents
☆806May 25, 2026Updated 2 months ago
lgy0404 / LearnAct
View on GitHub
[ACL 2026] LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark
☆48Apr 18, 2026Updated 3 months ago
showlab / WorldGUI
View on GitHub
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
☆124Jul 27, 2025Updated last year
OpenGVLab / ZeroGUI
View on GitHub
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
☆119Jul 17, 2025Updated last year
xlang-ai / computer-agent-arena
View on GitHub
[ICLR 2026] Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
☆67Feb 26, 2026Updated 5 months ago