njucckevin / SeeClick
The model, data and code for the visual GUI Agent SeeClick
☆226Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for SeeClick
- 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.☆200Updated this week
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆198Updated 4 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆83Updated 4 months ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆48Updated last month
- Towards Large Multimodal Models as Visual Foundation Agents☆120Updated this week
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆69Updated last week
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…☆355Updated last week
- ☆348Updated last month
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆162Updated 2 months ago
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness☆241Updated 2 weeks ago
- ☆154Updated 2 weeks ago
- VisualWebArena is a benchmark for multimodal agents.☆244Updated last week
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆235Updated 2 months ago
- A Universal Platform for Training and Evaluation of Mobile Interaction☆37Updated last week
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆231Updated 7 months ago
- Official Repo for UGround☆97Updated last week
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆315Updated 4 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆267Updated 2 weeks ago
- MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts☆239Updated 2 months ago
- This is the official repository for Retrieval Augmented Visual Question Answering☆182Updated 2 months ago
- ☆193Updated 6 months ago
- ☆89Updated 3 months ago
- ☆152Updated 4 months ago
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆246Updated last week
- ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)☆315Updated 2 months ago
- ☆196Updated 11 months ago
- ☆116Updated 5 months ago
- Environments, tools, and benchmarks for general computer agents☆172Updated 3 weeks ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆334Updated 11 months ago
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆114Updated 2 months ago