CCAI-Lab / Awesome-GUI-Agents

A curated collection of resources, tools, and frameworks for developing GUI Agents.

☆42

Alternatives and similar repositories for Awesome-GUI-Agents

Users that are interested in Awesome-GUI-Agents are comparing it to the libraries listed below

Sorting:

njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆77Updated 3 months ago
ADaM-BJTU / Mind_with_eyes_Awesome_MLLMs_Reasoning
This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!
☆37Updated last month
ShadeCloak / ADORA
☆43Updated last month
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆67Updated 3 months ago
ritzz-ai / GUI-R1
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
☆76Updated last week
1zhou-Wang / MemVR
[ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…
☆50Updated this week
John-AI-Lab / NoisyRollout
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆53Updated last week
IDEA-FinAI / ChartMoE
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
☆78Updated last month
Open-DataFlow / Awesome_MLLMs_Reasoning
☆95Updated last month
zhishuifeiqian / VCR-Bench
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
☆26Updated last month
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆47Updated 5 months ago
mat-agent / MAT-Agent
☆32Updated 3 weeks ago
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆100Updated 2 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆56Updated 10 months ago
Wild-Cooperation-Hub / Awesome-MLLM-Reasoning-Benchmarks
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
☆59Updated last month
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆94Updated 2 months ago
lll6gg / UI-R1
Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
☆92Updated this week
RyanLiu112 / Awesome-Process-Reward-Models
A comprehensive collection of process reward models.
☆76Updated last week
GuangyanS / Sys2-LLaVA
☆24Updated 3 months ago
LightChen233 / M3CoT
☆73Updated 11 months ago
jungao1106 / ICoT
[CVPR' 25] Interleaved-Modal Chain-of-Thought
☆39Updated 3 weeks ago
YuxiangChai / AMEX-codebase
☆29Updated 7 months ago
OpenRLHF / OpenRLHF-M
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
☆120Updated last month
ZichenWen1 / DART
Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"
☆40Updated 2 weeks ago
RifleZhang / LLaVA-Reasoner-DPO
☆75Updated 4 months ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
☆90Updated last week
ModalMinds / MM-PRM
MM-PRM: An open implementation of Multimodal OmegaPRM and its corresponding training pipeline
☆13Updated last month
OS-Copilot / OS-Genesis
Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
☆131Updated this week
UCSC-VLAA / VLAA-Thinking
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆104Updated 3 weeks ago
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆79Updated 3 months ago