zai-org/CogAgent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zai-org/CogAgent)

zai-org / CogAgent

An open-sourced end-to-end VLM-based GUI Agent

☆1,187

Alternatives and similar repositories for CogAgent

Users that are interested in CogAgent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

showlab / ShowUI
View on GitHub
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
☆1,881Apr 24, 2026Updated 2 months ago
X-PLUG / MobileAgent
View on GitHub
Mobile-Agent: The Powerful GUI Agent Family
☆8,957Jul 7, 2026Updated 2 weeks ago
xlang-ai / aguvis
View on GitHub
[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
☆389Mar 7, 2025Updated last year
bytedance / UI-TARS
View on GitHub
Pioneering Automated GUI Interaction with Native Agents
☆11,202Jan 27, 2026Updated 5 months ago
njucckevin / SeeClick
View on GitHub
The model, data and code for the visual GUI Agent SeeClick
☆490Jul 13, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
AriaUI / Aria-UI
View on GitHub
Open-sourced, Fast and Context-aware Action Grounding from GUI Instructions for GUI/Computer-use Agents
☆406Feb 8, 2025Updated last year
showlab / Awesome-GUI-Agent
View on GitHub
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
☆1,197Aug 17, 2025Updated 11 months ago
zai-org / CogVLM
View on GitHub
a state-of-the-art-level open visual language model | 多模态预训练模型
☆6,742May 29, 2024Updated 2 years ago
microsoft / OmniParser
View on GitHub
A simple screen parsing tool towards pure vision based GUI agent
☆25,179Updated this week
THUDM / WebRL
View on GitHub
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
☆535Jun 6, 2025Updated last year
THUDM / AutoWebGLM
View on GitHub
An LLM-based Web Navigating Agent (KDD'24)
☆930Sep 27, 2024Updated last year
niuzaisheng / ScreenAgent
View on GitHub
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
☆606Nov 25, 2024Updated last year
TencentQQGYLab / AppAgent
View on GitHub
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
☆6,814Mar 19, 2025Updated last year
zai-org / CogVLM2
View on GitHub
GPT4V-level open-source multi-modal model based on Llama3-8B
☆2,436Mar 3, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
OS-Copilot / OS-Atlas
View on GitHub
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
☆452Apr 20, 2025Updated last year
showlab / WorldGUI
View on GitHub
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
☆124Jul 27, 2025Updated 11 months ago
showlab / computer_use_ootb
View on GitHub
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
☆1,953May 21, 2025Updated last year
OpenBMB / AgentCPM-GUI
View on GitHub
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient…
☆1,392Jan 11, 2026Updated 6 months ago
QwenLM / Qwen-Agent
View on GitHub
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
☆16,821Mar 4, 2026Updated 4 months ago
THUDM / Android-Lab
View on GitHub
☆322Aug 18, 2025Updated 11 months ago
bytedance / UI-TARS-desktop
View on GitHub
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
☆38,143Jul 1, 2026Updated 2 weeks ago
OSU-NLP-Group / GUI-Agents-Paper-List
View on GitHub
Awesome GUI Agent Paper List
☆861Jun 28, 2026Updated 3 weeks ago
microsoft / GUI-Actor
View on GitHub
[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
☆410Apr 13, 2026Updated 3 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
OS-Copilot / OS-Genesis
View on GitHub
[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
☆188Oct 8, 2025Updated 9 months ago
OpenGVLab / GUI-Odyssey
View on GitHub
[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…
☆159Jan 3, 2026Updated 6 months ago
OSU-NLP-Group / UGround
View on GitHub
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆314Mar 11, 2026Updated 4 months ago
IMNearth / CoAT
View on GitHub
Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)
☆103Oct 14, 2024Updated last year
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆25,948Jun 25, 2026Updated 3 weeks ago
microsoft / WindowsAgentArena
View on GitHub
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
☆881Apr 13, 2026Updated 3 months ago
VITA-MLLM / VITA
View on GitHub
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
☆2,520Mar 28, 2025Updated last year
zai-org / GLM-4
View on GitHub
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
☆7,071Jul 4, 2025Updated last year
zai-org / GLM-V
View on GitHub
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
☆2,356Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,630Jan 30, 2026Updated 5 months ago
MiniMax-AI / MiniMax-01
View on GitHub
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
☆3,445Jul 7, 2025Updated last year
camel-ai / owl
View on GitHub
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
☆20,020Jul 10, 2026Updated last week
camel-ai / camel
View on GitHub
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
☆17,447Updated this week
YuxiangChai / AMEX-codebase
View on GitHub
☆33Sep 27, 2024Updated last year
Alibaba-NLP / DeepResearch
View on GitHub
Tongyi Deep Research, the Leading Open-source Deep Research Agent
☆19,691Feb 27, 2026Updated 4 months ago
simular-ai / Agent-S
View on GitHub
Agent S: an open agentic framework that uses computers like a human
☆12,039May 13, 2026Updated 2 months ago