opendilab / awesome-ui-agents
A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond (continually updated)
☆139Updated last week
Alternatives and similar repositories for awesome-ui-agents:
Users that are interested in awesome-ui-agents are comparing it to the libraries listed below
- PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)☆159Updated last month
- Building open-ended embodied agent in battle royale FPS game☆37Updated last year
- Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆94Updated 3 weeks ago
- Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.☆303Updated 2 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆179Updated last week
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆102Updated 2 months ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆72Updated 4 months ago
- The Code Repo for Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization☆104Updated 5 months ago
- Python library for solving reinforcement learning (RL) problems using generative models (e.g. Diffusion Models).☆112Updated last month
- The model, data and code for the visual GUI Agent SeeClick☆308Updated 2 months ago
- Building a comprehensive and handy list of papers for GUI agents☆200Updated 3 weeks ago
- CodeMorpheus: Generate code self-portraits with one click(一键生成代码自画像,决策型 AI + 生成式 AI)☆46Updated last year
- 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.☆484Updated 2 weeks ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆87Updated 3 months ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆219Updated 7 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆60Updated last month
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆276Updated 2 months ago
- A Universal Platform for Training and Evaluation of Mobile Interaction☆41Updated last month
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness☆291Updated 2 months ago
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction☆210Updated last month
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆141Updated last month
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆160Updated this week
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆98Updated 6 months ago
- ☆186Updated 2 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆140Updated 3 weeks ago
- AndroidWorld is an environment and benchmark for autonomous agents☆209Updated this week
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆103Updated 11 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆288Updated 6 months ago
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆192Updated last week
- A curated list of Multi-Modal Reinforcement Learning resources (continually updated)☆431Updated last week