xlang-ai/computer-agent-arena

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xlang-ai/computer-agent-arena)

xlang-ai / computer-agent-arena

[ICLR 2026] Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents

☆67

Alternatives and similar repositories for computer-agent-arena

Users that are interested in computer-agent-arena are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xlang-ai / VideoAgentTrek
View on GitHub
The official repo of VideoAgentTrek
☆57Oct 24, 2025Updated 9 months ago
xlang-ai / OSWorld-G
View on GitHub
[NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆172Jun 18, 2026Updated last month
kiaia / GIRAFFE
View on GitHub
Extending context length of visual language models
☆12Dec 18, 2024Updated last year
xlang-ai / CUA-Gym
View on GitHub
Scalable pipeline for synthesizing verifiable RLVR training data for computer-use agents
☆180May 26, 2026Updated 2 months ago
HKUNLP / SymGen
View on GitHub
[EMNLP'23] Code for Generating Data for Symbolic Language with Large Language Models
☆18Oct 21, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
WukLab / osworld-human
View on GitHub
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents
☆27May 17, 2026Updated 2 months ago
SumilerGAO / SunGen
View on GitHub
☆28Feb 26, 2023Updated 3 years ago
xlang-ai / OSWorld-V2
View on GitHub
OSWorld 2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks
☆201Updated this week
niuzaisheng / ScreenExplorer
View on GitHub
ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World
☆26Jun 17, 2025Updated last year
HKUNLP / ProGen
View on GitHub
[EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.
☆27Feb 4, 2023Updated 3 years ago
Timothyxxx / TestTimeTrainingPapers
View on GitHub
☆59Apr 13, 2026Updated 3 months ago
xijia-tao / ImgTrojan
View on GitHub
Code and data for "ImgTrojan: Jailbreaking Vision-Language Models with ONE Image"
☆24Mar 26, 2025Updated last year
PathOnAIOrg / awesome-ai-agents
View on GitHub
Collection of Materials on AI Agents
☆45Feb 11, 2026Updated 5 months ago
zhaoxlpku / DynaAct
View on GitHub
☆15Nov 12, 2025Updated 8 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
iafiscal1212 / diffuse-cpp
View on GitHub
High-performance C++ inference engine for Diffusion Language Models (LLaDA, SEDD, MDLM)
☆16Apr 5, 2026Updated 3 months ago
OS-Copilot / OS-Genesis
View on GitHub
[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
☆188Oct 8, 2025Updated 9 months ago
taogoddd / GPT-4V-API
View on GitHub
Self-hosted GPT-4V api
☆27Nov 6, 2023Updated 2 years ago
OS-Copilot / OS-Sentinel
View on GitHub
[ACL 2026] Code, benchmark and environment for "OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic…
☆49Jul 5, 2026Updated 2 weeks ago
qtli / GSM-Plus
View on GitHub
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆66Jul 8, 2024Updated 2 years ago
ServiceNow / webarena-verified
View on GitHub
A verified version of the WebArena Benchmark
☆46Mar 8, 2026Updated 4 months ago
JIA-Lab-research / ARPO
View on GitHub
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
☆162May 29, 2025Updated last year
SebastianBodza / EnsembleForecasting
View on GitHub
Using multiple LLMs for ensemble Forecasting
☆16Jan 17, 2024Updated 2 years ago
RUCBM / GUICourse
View on GitHub
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
☆143Mar 1, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
THUDM / SCALE-CUA
View on GitHub
Open-source framework for computer use agents: VeriGen verifiable task synthesis, online RL training (AgentRL), and OSWorld/ScienceBoard …
☆33Updated this week
HKUNLP / critic-rl
View on GitHub
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆127May 6, 2025Updated last year
ZuyiZhou / Awesome-Interpretable-Cross-modal-Reasoning
View on GitHub
A Survey on Interpretable Cross-modal Reasoning
☆15Oct 12, 2023Updated 2 years ago
xlang-ai / FineVLA
View on GitHub
Scalable annotation pipeline for action-aglined fine-grained instruciton for Visual-language-Action model
☆73Updated this week
xlang-ai / aguvis
View on GitHub
[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
☆389Mar 7, 2025Updated last year
xwhan / pylucene-bm25
View on GitHub
Lucene open-domain QA retrieval in python
☆11Feb 18, 2021Updated 5 years ago
zhaoxlpku / SubgoalXL
View on GitHub
☆26Aug 23, 2024Updated last year
ranpox / openreview-visualization
View on GitHub
OpenReivew Submission Visualization (ICLR 2024/2025)
☆153Oct 17, 2024Updated last year
xlang-ai / AgentNetTool
View on GitHub
This is the official code base of AgentNetTool in OpenCUA. Website: https://opencua.xlang.ai/
☆52Sep 3, 2025Updated 10 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Timothyxxx / EnvInteractiveLMPapers
View on GitHub
Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…
☆128Jul 26, 2023Updated 3 years ago
HKUNLP / RSA
View on GitHub
Retrieved Sequence Augmentation for Protein Representation Learning
☆52Nov 1, 2023Updated 2 years ago
xlang-ai / AgentTrek
View on GitHub
[ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
☆60Feb 21, 2025Updated last year
will-singularity / Skywork-MM
View on GitHub
Empirical Study Towards Building An Effective Multi-Modal Large Language Model
☆22Oct 25, 2023Updated 2 years ago
yxuansu / Awesome_Diffusions
View on GitHub
☆17Feb 20, 2023Updated 3 years ago
cogito233 / llm-bivariate-causal-discovery
View on GitHub
☆13Aug 18, 2022Updated 3 years ago
OSU-NLP-Group / SeeAct
View on GitHub
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…
☆851Feb 3, 2025Updated last year