YXB-NKU / SE-GUILinks
[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
☆89Updated 3 months ago
Alternatives and similar repositories for SE-GUI
Users that are interested in SE-GUI are comparing it to the libraries listed below
Sorting:
- [AAAI 2026]Release of code, datasets and model for our work TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for General…☆66Updated last month
- ☆90Updated last year
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆267Updated 2 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆232Updated 2 months ago
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.☆107Updated 6 months ago
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆110Updated last month
- Code for paper: Reinforced Vision Perception with Tools☆69Updated 3 months ago
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆63Updated 4 months ago
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…☆97Updated 4 months ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆72Updated 2 months ago
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆145Updated 6 months ago
- Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"☆172Updated 2 weeks ago
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆394Updated 4 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆137Updated 5 months ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆152Updated 3 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆88Updated last year
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆95Updated 8 months ago
- GroundCUA☆65Updated last month
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆77Updated last year
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆203Updated last year
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆104Updated 4 months ago
- A Self-Training Framework for Vision-Language Reasoning☆88Updated last year
- ☆42Updated 6 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆179Updated 7 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆93Updated last month
- A collection of awesome think with videos papers.☆83Updated last month
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆217Updated 8 months ago
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆27Updated 8 months ago
- Official implement of MIA-DPO☆70Updated last year
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Updated last year