njucckevin / SeeClick
The model, data and code for the visual GUI Agent SeeClick
β349Updated 4 months ago
Alternatives and similar repositories for SeeClick:
Users that are interested in SeeClick are comparing it to the libraries listed below
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agentsβ195Updated last week
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)β81Updated 5 months ago
- π» A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.β596Updated 3 weeks ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)β231Updated 8 months ago
- Building a comprehensive and handy list of papers for GUI agentsβ269Updated 3 weeks ago
- Towards Large Multimodal Models as Visual Foundation Agentsβ195Updated last month
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interactionβ270Updated 3 weeks ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes frβ¦β98Updated 4 months ago
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agentsβ311Updated last month
- β206Updated last week
- Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesisβ118Updated last week
- GUICourse: From General Vision Langauge Models to Versatile GUI Agentsβ106Updated 8 months ago
- GitHub page for "Large Language Model-Brained GUI Agents: A Survey"β139Updated this week
- β419Updated 6 months ago
- AndroidWorld is an environment and benchmark for autonomous agentsβ256Updated this week
- VisualWebArena is a benchmark for multimodal agents.β320Updated 4 months ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RLβ342Updated last month
- Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.β340Updated last month
- ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K β¦β107Updated last month
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agentsβ198Updated last month
- β¨β¨Latest Papers and Datasets on Mobile and PC GUI Agentβ117Updated 4 months ago
- A Universal Platform for Training and Evaluation of Mobile Interactionβ44Updated last month
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Stepβ265Updated last year
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agentβ285Updated 2 weeks ago
- GUI Grounding for Professional High-Resolution Computer Useβ161Updated last month
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planningβ218Updated 2 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β407Updated 3 weeks ago
- [CVPR'25] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthinessβ333Updated last month
- Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi eβ¦β437Updated 3 weeks ago
- Explore the Multimodal βAha Momentβ on 2B Modelβ538Updated 2 weeks ago