showlab / Awesome-GUI-Agent
๐ป A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
โ175Updated last week
Related projects โ
Alternatives and complementary repositories for Awesome-GUI-Agent
- The model, data and code for the visual GUI Agent SeeClickโ215Updated 2 months ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes frโฆโ64Updated 3 months ago
- Towards Large Multimodal Models as Visual Foundation Agentsโ113Updated last week
- GUICourse: From General Vision Langauge Models to Versatile GUI Agentsโ78Updated 3 months ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)โ196Updated 3 months ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)โ42Updated 3 weeks ago
- Official Repo for UGroundโ93Updated this week
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"โ163Updated 2 months ago
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthinessโ230Updated this week
- โ85Updated 3 months ago
- โ339Updated last month
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(โฆโ242Updated this week
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedbackโ230Updated last month
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought โฆโ132Updated 3 weeks ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability ofโฆโ98Updated 2 weeks ago
- Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.โ252Updated last month
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eโฆโ353Updated 3 weeks ago
- โ10Updated last week
- โ152Updated 4 months ago
- Align Anything: Training All-modality Model with Feedbackโ220Updated this week
- Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi eโฆโ346Updated last month
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsโ300Updated 6 months ago
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,โฆโ110Updated 2 weeks ago
- โ117Updated last week
- โ23Updated 6 months ago
- Environments, tools, and benchmarks for general computer agentsโ171Updated 2 weeks ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RLโ147Updated this week
- A RLHF Infrastructure for Vision-Language Modelsโ98Updated 4 months ago
- A Survey on Benchmarks of Multimodal Large Language Modelsโ59Updated 3 weeks ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningโ199Updated last month