showlab / ShowUI
β10Updated this week
Related projects β
Alternatives and complementary repositories for ShowUI
- π» A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.β200Updated this week
- A Universal Platform for Training and Evaluation of Mobile Interactionβ37Updated last week
- GUICourse: From General Vision Langauge Models to Versatile GUI Agentsβ83Updated 4 months ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes frβ¦β69Updated last week
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β131Updated 2 months ago
- Towards Large Multimodal Models as Visual Foundation Agentsβ122Updated last week
- β65Updated last year
- Official implementation of WebVLN: Vision-and-Language Navigation on Websitesβ23Updated 10 months ago
- πOctopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.β264Updated 6 months ago
- [NIPS24W]This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulatedβ¦β73Updated 4 months ago
- β61Updated last month
- The model, data and code for the visual GUI Agent SeeClickβ226Updated 2 months ago
- The Official Code Repository for GUI-World.β41Updated 3 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learningβ98Updated 6 months ago
- [ICLR 2024] Source codes for the paper "Building Cooperative Embodied Agents Modularly with Large Language Models"β225Updated 3 weeks ago
- β58Updated 9 months ago
- βπ STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environmentβ30Updated 10 months ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)β48Updated last month
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lanβ¦β48Updated 3 weeks ago
- SceneGenAgent: Precise Industrial Scene Generation with Coding Agentβ11Updated 3 weeks ago
- β120Updated last month
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)β198Updated 4 months ago
- β40Updated 11 months ago
- β23Updated 7 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)β184Updated this week
- β11Updated 6 months ago
- Official Repo for UGroundβ97Updated 2 weeks ago
- [CVPR2024] This is the official implement of MP5β84Updated 4 months ago
- A repository accompanying the PARTNR benchmark for using Large Planning Models (LPMs) to solve Human-Robot Collaboration or Robot Instrucβ¦β67Updated 2 weeks ago
- β27Updated 5 months ago