ai-agents-2030 / SPA-Bench
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
☆30Updated last week
Alternatives and similar repositories for SPA-Bench:
Users that are interested in SPA-Bench are comparing it to the libraries listed below
- ☆29Updated 7 months ago
- Code for "UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning"☆88Updated this week
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 5 months ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆85Updated 6 months ago
- ☆40Updated last year
- A comprehensive collection of process reward models.☆74Updated last week
- ☆132Updated 4 months ago
- ☆111Updated this week
- Towards Large Multimodal Models as Visual Foundation Agents☆209Updated last week
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆138Updated 6 months ago
- ☆19Updated 6 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 6 months ago
- ☆29Updated 7 months ago
- Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆130Updated last month
- ☆144Updated last month
- GitHub page for "Large Language Model-Brained GUI Agents: A Survey"☆149Updated last week
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆112Updated 2 weeks ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆71Updated last week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆118Updated last month
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆135Updated 4 months ago
- A research repo for experiments about Reinforcement Finetuning☆46Updated 3 weeks ago
- Building a comprehensive and handy list of papers for GUI agents☆313Updated this week
- ☆168Updated last month
- ☆138Updated this week
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆124Updated 5 months ago
- Repository for the paper "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners"☆23Updated last week
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆109Updated 5 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆131Updated 4 months ago
- ☆115Updated last week
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g …☆34Updated last month