Westlake-AGI-Lab / AppAgentX
Official implementation of AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
☆249Updated 3 weeks ago
Alternatives and similar repositories for AppAgentX:
Users that are interested in AppAgentX are comparing it to the libraries listed below
- An open-sourced end-to-end VLM-based GUI Agent☆845Updated last month
- ☆49Updated 4 months ago
- PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World☆217Updated 3 months ago
- Open-sourced, Fast and Context-aware Action Grounding from GUI Instructions for GUI/Computer-use Agents☆343Updated last month
- [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.☆1,127Updated 2 weeks ago
- ☆218Updated last month
- 🌐 WebWalker: Benchmarking LLMs in Web Traversal☆378Updated last week
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆278Updated last week
- A LLM-based Agent that predict its tasks proactively.☆337Updated last week
- An LLM-based Web Navigating Agent (KDD'24)☆828Updated 6 months ago
- VisionTasker introduces a novel two-stage framework combining vision-based UI understanding and LLM task planning for mobile task automat…☆64Updated last month
- AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire pro…☆202Updated 8 months ago
- ☆188Updated 7 months ago
- ☆310Updated 3 months ago
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆223Updated 5 months ago
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆395Updated last week
- 利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data☆154Updated 4 months ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆135Updated this week
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction☆268Updated 3 weeks ago
- AndroidWorld is an environment and benchmark for autonomous agents☆250Updated this week
- Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"☆263Updated 2 months ago
- ☆743Updated this week
- The model, data and code for the visual GUI Agent SeeClick☆343Updated 4 months ago
- 🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/☆316Updated 4 months ago
- ☆199Updated this week
- [ICLR 2025] The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆133Updated last month
- Source code for the paper "Empowering LLM to use Smartphone for Intelligent Task Automation"☆335Updated last year
- ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)☆422Updated 4 months ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆80Updated 5 months ago
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking☆429Updated this week