ZJU-REAL / GUI-RCPOLinks
[AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615
☆49Updated last week
Alternatives and similar repositories for GUI-RCPO
Users that are interested in GUI-RCPO are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] Let LRMs Break Free from Overthinking via Self-Braking Tuning. https://arxiv.org/abs/2505.14604☆50Updated last week
- This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆40Updated last month
- ☆36Updated last month
- ☆32Updated 4 months ago
- Collection of model-centric MCP servers☆24Updated 5 months ago
- ☆137Updated 3 months ago
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆89Updated 5 months ago
- ☆170Updated this week
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆38Updated last year
- The open-source code of MetaStone-S1.☆107Updated 3 months ago
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"☆121Updated 2 months ago
- [NeurIPS'25] Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning☆80Updated last month
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆44Updated 9 months ago
- (ICLR 2025) The Official Code Repository for GUI-World.☆67Updated 10 months ago
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆57Updated 2 months ago
- ☆67Updated 7 months ago
- Reproducible Language Agent Research☆29Updated 4 months ago
- ☆15Updated last year
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Updated 11 months ago
- Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework☆29Updated 3 months ago
- ☆35Updated 9 months ago
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆133Updated 3 months ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆16Updated 3 weeks ago
- 🌟Official code of our AAAI26 paper 🔍WebFilter☆30Updated last week
- Official code repository for the paper "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"☆20Updated last month
- Official Repository for PosterGen☆176Updated 3 weeks ago
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆41Updated 10 months ago
- [NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents☆351Updated 2 weeks ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆92Updated last year
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year