MetaAgentX / OpenCaptchaWorldLinks
[NeurIPS 2025] The first web-based benchmark and platform to evaluate visual reasoning and interaction capabilities of MLLM powered agents through diverse and dynamic CAPTCHA puzzles.
☆56Updated last month
Alternatives and similar repositories for OpenCaptchaWorld
Users that are interested in OpenCaptchaWorld are comparing it to the libraries listed below
Sorting:
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆177Updated 3 months ago
- ☆73Updated 8 months ago
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆149Updated last year
- [NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents☆376Updated 3 months ago
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆297Updated 6 months ago
- ☆254Updated last week
- [NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆145Updated 3 months ago
- WritingBench: A Comprehensive Benchmark for Generative Writing☆156Updated last month
- [AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615☆58Updated 2 months ago
- ☆297Updated 5 months ago
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆44Updated last year
- [ICLR 2026] Efficient Agent Training for Computer Use☆135Updated 5 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆50Updated last year
- (ICLR 2025) The Official Code Repository for GUI-World.☆68Updated last year
- [ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction☆379Updated 11 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆256Updated 9 months ago
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆120Updated 8 months ago
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆52Updated 10 months ago
- ☆122Updated 4 months ago
- ☆82Updated 10 months ago
- ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization☆95Updated 8 months ago
- [ACL 2025] GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent☆58Updated 8 months ago
- Official implementation of UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning☆63Updated last month
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆148Updated 8 months ago
- ☆95Updated last year
- ZeroGUI: Automating Online GUI Learning at Zero Human Cost☆107Updated 6 months ago
- [AAAI 2026] The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants☆46Updated last month
- Repository for the paper "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners"☆64Updated 2 months ago
- [NeurIPS 2025] A multimodal agent that can interact with its own PC in a multimodal manner.☆36Updated 2 months ago
- This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…☆13Updated 6 months ago