ritzz-ai / GUI-R1Links
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
β178Updated 4 months ago
Alternatives and similar repositories for GUI-R1
Users that are interested in GUI-R1 are comparing it to the libraries listed below
Sorting:
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGIβ155Updated last week
- MAT: Multi-modal Agent Tuning π₯ ICLR 2025 (Spotlight)β60Updated 2 months ago
- A Self-Training Framework for Vision-Language Reasoningβ83Updated 7 months ago
- Paper collections of multi-modal LLM for Math/STEM/Code.β126Updated last month
- Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"β127Updated 3 months ago
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent wβ¦β76Updated last week
- β215Updated last week
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentationβ87Updated last month
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.β144Updated 5 months ago
- β104Updated 2 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replayβ122Updated 3 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"β143Updated 3 months ago
- A RLHF Infrastructure for Vision-Language Modelsβ183Updated 10 months ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learningβ80Updated 3 months ago
- Repository for the paper "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners"β59Updated 3 months ago
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 eβ¦β128Updated last month
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.β152Updated last week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*β107Updated 3 months ago
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesisβ157Updated 2 weeks ago
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.β299Updated 3 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tooβ¦β310Updated 3 weeks ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoningβ273Updated this week
- π A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyondβ289Updated last month
- Towards Large Multimodal Models as Visual Foundation Agentsβ236Updated 4 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.β326Updated 2 months ago
- β82Updated last year
- repo for paper https://arxiv.org/abs/2504.13837β190Updated 2 months ago
- GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoningβ162Updated 3 months ago
- Official Repository of "Learning what reinforcement learning can't"β65Updated last week
- β264Updated 2 months ago