xlang-ai / computer-agent-arena
Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!
☆44Updated last month
Alternatives and similar repositories for computer-agent-arena:
Users that are interested in computer-agent-arena are comparing it to the libraries listed below
- General Reasoner: Advancing LLM Reasoning Across All Domains☆77Updated this week
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆101Updated 2 months ago
- An Open Math Pre-trainng Dataset with 370B Tokens.☆80Updated last month
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- ☆121Updated this week
- Challenges for general-purpose web-browsing AI agents☆47Updated 2 months ago
- ☆63Updated last month
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆35Updated 2 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆56Updated 3 weeks ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆80Updated last month
- An Illusion of Progress? Assessing the Current State of Web Agents☆42Updated this week
- ☆50Updated last week
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆36Updated 4 months ago
- Code for ScribeAgent paper☆57Updated 2 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆94Updated 3 weeks ago
- ☆92Updated 3 months ago
- ☆25Updated 7 months ago
- ☆38Updated 4 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆186Updated 3 weeks ago
- Agent Skill Induction: "Inducing Programmatic Skills for Agentic Tasks"☆14Updated 2 weeks ago
- ☆46Updated last week
- Code and Data for "Language Modeling with Editable External Knowledge"☆32Updated 10 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆135Updated 5 months ago
- ☆62Updated last month
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆53Updated last month
- ☆20Updated 11 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆51Updated 4 months ago
- Reformatted Alignment☆115Updated 7 months ago
- ☆68Updated last month
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆60Updated last month