THUDM / CogAgent
An open-sourced end-to-end VLM-based GUI Agent
β837Updated last month
Alternatives and similar repositories for CogAgent:
Users that are interested in CogAgent are comparing it to the libraries listed below
- An LLM-based Web Navigating Agent (KDD'24)β828Updated 5 months ago
- [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.β1,117Updated last week
- π WebWalker: Benchmarking LLMs in Web Traversalβ376Updated this week
- A LLM-based Agent that predict its tasks proactively.β332Updated this week
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interactionβ265Updated 2 weeks ago
- Build & Optimize your RAG.β564Updated last week
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinkingβ421Updated last week
- Open-sourced, Fast and Context-aware Action Grounding from GUI Instructions for GUI/Computer-use Agentsβ340Updated last month
- Parsing-free RAG supported by VLMsβ636Updated last month
- Search-o1: Agentic Search-Enhanced Large Reasoning Modelsβ719Updated 2 weeks ago
- LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QAβ480Updated 2 months ago
- An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through inβ¦β684Updated 5 months ago
- π¦οΈ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/β313Updated 4 months ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RLβ330Updated last month
- Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"β262Updated 2 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agentβ274Updated this week
- PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital Worldβ215Updated 2 months ago
- AgentNetworkProtocol(ANP) is an open source protocol for agent communication. Our vision is to define how agents connect with each other,β¦β297Updated last week
- An Open Large Reasoning Model for Real-World Solutionsβ1,475Updated 2 weeks ago
- [ICLR 2025] Agent S: an open agentic framework that uses computers like a humanβ1,321Updated this week
- ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)β420Updated 3 months ago
- Profile-Based Long-Term Memory for AI Applicationsβ913Updated this week
- "MiniRAG: Making RAG Simpler with Small and Free Language Models"β860Updated last week