showlab / assistguiLinks
☆28Updated last year
Alternatives and similar repositories for assistgui
Users that are interested in assistgui are comparing it to the libraries listed below
Sorting:
- The model, data and code for the visual GUI Agent SeeClick☆399Updated this week
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆119Updated 8 months ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆91Updated 9 months ago
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.☆85Updated 2 weeks ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆119Updated last year
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)☆224Updated 7 months ago
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆262Updated last month
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆229Updated last month
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆343Updated 6 months ago
- Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"☆120Updated last month
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆289Updated 8 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆144Updated 10 months ago
- VisualWebArena is a benchmark for multimodal agents.☆357Updated 8 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆284Updated 10 months ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆244Updated 8 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆221Updated 2 months ago
- 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.☆783Updated last month
- The official repository of "Video assistant towards large language model makes everything easy"