☆125Oct 3, 2025Updated 5 months ago
Alternatives and similar repositories for GTA1
Users that are interested in GTA1 are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] A multimodal agent that can interact with its own PC in a multimodal manner.☆35Feb 25, 2026Updated last week
- ☆21May 3, 2025Updated 10 months ago
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆300Jul 18, 2025Updated 7 months ago
- ☆25Jan 28, 2026Updated last month
- [NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆150Nov 6, 2025Updated 3 months ago
- This is the official code base of AgentNetTool in OpenCUA. Website: https://opencua.xlang.ai/☆39Sep 3, 2025Updated 6 months ago
- ☆31Jul 3, 2025Updated 8 months ago
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆21Jan 6, 2026Updated last month
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆148May 29, 2025Updated 9 months ago
- For ACL25 paper "WAFFLE: Multi-Modal Model for Automated Front-End Development" - by Shanchao Liang and Nan Jiang and Shangshu Qian and L…☆11May 28, 2025Updated 9 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- R1-like Computer-use Agent☆89Mar 21, 2025Updated 11 months ago
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆437Apr 20, 2025Updated 10 months ago
- This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…☆13Jul 27, 2025Updated 7 months ago
- ☆16Jun 10, 2025Updated 8 months ago
- ☆14Mar 11, 2025Updated 11 months ago
- ☆42Sep 15, 2025Updated 5 months ago
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 5 months ago
- Karras et al. (2022) diffusion models for PyTorch☆17Oct 5, 2023Updated 2 years ago
- [NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"☆97Oct 21, 2025Updated 4 months ago
- a collaborative agent-based workflow designed for NL2Vis task☆19Mar 6, 2025Updated 11 months ago
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆12Jan 29, 2024Updated 2 years ago
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆26Feb 17, 2026Updated 2 weeks ago
- 🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"☆24Dec 14, 2025Updated 2 months ago
- [NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents☆385Feb 11, 2026Updated 3 weeks ago
- A simple visual test-time scaling method for GUI agent grounding☆20Dec 7, 2025Updated 2 months ago
- A minimal MCP Server based on the Anthropic's "think" tool research☆23Aug 1, 2025Updated 7 months ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆99May 20, 2025Updated 9 months ago
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…☆100Sep 8, 2025Updated 5 months ago
- [ACL 2025 Findings] Text2World: Benchmarking Large Language Models for Symbolic World Model Generation☆28Feb 25, 2025Updated last year
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 8 months ago
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆48Jul 17, 2025Updated 7 months ago
- Benchmark of complex, multimodal desktop-oriented tasks for advanced GUI-navigation AI agents☆24May 7, 2025Updated 9 months ago
- ☆73May 23, 2025Updated 9 months ago
- Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.☆826Feb 11, 2026Updated 3 weeks ago
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆180Oct 8, 2025Updated 4 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆51Feb 22, 2026Updated last week
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology☆75Jan 26, 2026Updated last month
- [AAAI 2026] GUI-G²: Gaussian Reward Modeling for GUI Grounding☆303Feb 2, 2026Updated last month