THUDM / CogAgent
An open-sourced end-to-end VLM-based GUI Agent
☆513Updated last week
Alternatives and similar repositories for CogAgent:
Users that are interested in CogAgent are comparing it to the libraries listed below
- Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.☆841Updated this week
- A LLM-based Agent that predict its tasks proactively.☆277Updated last week
- An LLM-based Web Navigating Agent (KDD'24)☆791Updated 3 months ago
- LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA☆451Updated 2 weeks ago
- ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)☆358Updated last month
- An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through in…☆630Updated 2 months ago
- ☆368Updated last month
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆205Updated this week
- ☆196Updated last month
- Parsing-free RAG supported by VLMs☆552Updated this week
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆595Updated last month
- Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi e…☆380Updated last month
- Agent S: an open agentic framework that uses computers like a human☆749Updated this week
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆282Updated 3 weeks ago
- ☆1,137Updated last month
- The First Multimodal Seach Engine Pipeline and Benchmark for LMMs☆407Updated last month
- 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.☆428Updated this week
- WebDesignAgent : Towards Effortless Website Creation☆243Updated 3 months ago
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding☆797Updated this week
- A repo with an automated prompt engineering workflow from scratch. It leverages the OPRO technique.☆172Updated 4 months ago
- The model, data and code for the visual GUI Agent SeeClick☆283Updated last month
- An Open Large Reasoning Model for Real-World Solutions☆1,378Updated last month
- ☆158Updated last month
- ✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆1,932Updated this week
- Open-sourced, Fast and Context-aware Action Grounding from GUI Instructions for GUI/Computer-use Agents☆262Updated this week
- Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.☆557Updated last month
- AI for all: Build the large graph of the language models☆250Updated 7 months ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆103Updated this week
- An open platform for enhancing the capability of LLMs in workflow orchestration.☆89Updated last month
- A python native agent framework☆437Updated last month