lll6gg / UI-R1
Code for "UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning"
☆88Updated this week
Alternatives and similar repositories for UI-R1:
Users that are interested in UI-R1 are comparing it to the libraries listed below
- Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆130Updated last month
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆85Updated 6 months ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆109Updated 5 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆209Updated last week
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆120Updated 3 weeks ago
- ☆168Updated last month
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆124Updated 5 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆135Updated 4 months ago
- A jounery to real multimodel R1 ! We are doing on large-scale experiment☆297Updated last month
- ☆111Updated this week
- ☆115Updated last week
- Building a comprehensive and handy list of papers for GUI agents☆313Updated this week
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆215Updated this week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆441Updated last week
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆159Updated last month
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆121Updated last week
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆113Updated 9 months ago
- A Self-Training Framework for Vision-Language Reasoning☆77Updated 3 months ago
- Code for NeurIPS 2024 paper "AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning"☆41Updated 5 months ago
- A curated collection of resources, tools, and frameworks for developing GUI Agents.☆37Updated 2 weeks ago
- GitHub page for "Large Language Model-Brained GUI Agents: A Survey"☆149Updated last week
- A RLHF Infrastructure for Vision-Language Models☆173Updated 5 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆120Updated 3 weeks ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆331Updated 2 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆583Updated last month
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆590Updated this week
- ☆29Updated 7 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆276Updated 7 months ago
- A comprehensive collection of process reward models.☆74Updated last week