MAI-UI: Real-World Centric Foundation GUI Agents ranging from 2B to 235B
☆1,765Mar 20, 2026Updated 2 weeks ago
Alternatives and similar repositories for MAI-UI
Users that are interested in MAI-UI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments☆170Updated this week
- Mobile-Agent: The Powerful GUI Agent Family☆8,408Mar 31, 2026Updated last week
- STEP-GUI: The top GUI agent solution in the galaxy. Developed by the StepFun-GELab team and powered by StepFun’s cutting-edge research c…☆2,108Mar 14, 2026Updated 3 weeks ago
- AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient…☆1,343Jan 11, 2026Updated 2 months ago
- Official implementation of UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning☆72Dec 30, 2025Updated 3 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- EvoCUA: Evolving Computer Use Agent☆303Mar 31, 2026Updated last week
- An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone☆24,698Mar 6, 2026Updated last month
- The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra☆29,213Mar 27, 2026Updated last week
- CodeVibes is an intelligent AI-powered code analysis tool that scans your GitHub repositories to uncover security vulnerabilities, bugs a…☆58Jan 12, 2026Updated 2 months ago
- Pioneering Automated GUI Interaction with Native Agents☆10,024Jan 27, 2026Updated 2 months ago
- ☆10Feb 14, 2025Updated last year
- Automate your mobile devices with natural language commands - an LLM agnostic mobile Agent 🤖☆8,107Updated this week
- AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.☆6,632Mar 19, 2025Updated last year
- The model, data and code for the visual GUI Agent SeeClick☆477Jul 13, 2025Updated 8 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,567Jun 14, 2025Updated 9 months ago
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…☆103Sep 8, 2025Updated 7 months ago
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆414Jan 29, 2026Updated 2 months ago
- Use AI to instantly summarize websites' terms of service and highlight any concerning elements☆17Apr 5, 2025Updated last year
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆15,889Mar 4, 2026Updated last month
- LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.☆13,772Updated this week
- GroundCUA☆117Mar 24, 2026Updated 2 weeks ago
- A simple screen parsing tool towards pure vision based GUI agent☆24,619Sep 12, 2025Updated 6 months ago
- Make YouTube videos readable. Local-first Markdown summaries with Ollama, with cloud providers support.☆63Dec 28, 2025Updated 3 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Run Claude Code/Codex within AgentFS, orchestrated by LlamaIndex Workflows☆318Dec 19, 2025Updated 3 months ago
- [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.☆1,766Jan 20, 2026Updated 2 months ago
- Youtu-Tip: Tap for Intelligence, Keep on Device.☆591Feb 27, 2026Updated last month
- ☆35Jan 12, 2026Updated 2 months ago
- ☆32Jul 3, 2025Updated 9 months ago
- A self-hosted, browser-based AI CSV analyzer☆75Apr 2, 2026Updated last week
- No fortress, purely open ground. OpenManus is Coming.☆55,599Feb 11, 2026Updated last month
- [ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆375Mar 30, 2026Updated last week
- Agent S: an open agentic framework that uses computers like a human☆10,761Feb 21, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Universal memory layer for AI Agents☆52,137Updated this week
- Tongyi Deep Research, the Leading Open-source Deep Research Agent☆18,612Feb 27, 2026Updated last month
- AI-powered, vision-driven UI automation for every platform.☆12,503Updated this week
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆24,255Mar 7, 2026Updated last month
- An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. C…☆20,439Apr 3, 2026Updated last week
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆18,917Jan 30, 2026Updated 2 months ago
- A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines☆5,506Apr 2, 2026Updated last week