ZJU-REAL / GUI-RCPOLinks
[AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615
☆54Updated 2 months ago
Alternatives and similar repositories for GUI-RCPO
Users that are interested in GUI-RCPO are comparing it to the libraries listed below
Sorting:
- ☆36Updated 3 months ago
- [NeurIPS 2025] Let LRMs Break Free from Overthinking via Self-Braking Tuning. https://arxiv.org/abs/2505.14604☆54Updated 2 months ago
- ☆195Updated 2 weeks ago
- ☆32Updated 5 months ago
- ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization☆95Updated 7 months ago
- Collection of model-centric MCP servers☆24Updated 7 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆91Updated 6 months ago
- ☆73Updated 7 months ago
- ☆145Updated 5 months ago
- The code and data of We-Math 2.0.☆163Updated 4 months ago
- 🌟Official code of our AAAI26 paper 🔍WebFilter☆33Updated 2 months ago
- [NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents☆369Updated 2 months ago
- Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, and external …☆52Updated last year
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"☆131Updated 4 months ago
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆173Updated 3 months ago
- Official Repository for PosterGen☆202Updated 2 weeks ago
- This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆41Updated 3 months ago
- Reinforcement Learning of Vision Language Models with Self Visual Perception Reward☆157Updated 3 months ago
- ☆32Updated 5 months ago
- Efficient Agent Training for Computer Use☆135Updated 4 months ago
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆43Updated last year
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆375Updated 4 months ago
- Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning☆43Updated 2 weeks ago
- [NeurIPS'25] Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning☆108Updated last week
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆143Updated 5 months ago
- Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework☆35Updated 5 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆98Updated last year
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆118Updated 7 months ago
- Code implementation for DyG-RAG: Dynamic Graph Retrieval-Augmented Generation with Event-Centric Reasoning.☆43Updated 4 months ago
- ☆18Updated last year