ddupont808 / GPT-4V-Act
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
☆983Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for GPT-4V-Act
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆647Updated last week
- Set-of-Mark Prompting for GPT-4V and LMMs☆1,185Updated 3 months ago
- [NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"☆716Updated 3 months ago
- Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"☆753Updated last month
- Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"☆342Updated 8 months ago
- Create browser automation as if you were teaching a human using GPT-4 Vision.☆564Updated 9 months ago
- MultiOn API☆449Updated 3 months ago
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,524Updated 2 months ago
- Agent driven automation starting with the web. Discord: https://discord.gg/wgNfmFuqJF☆818Updated this week
- [IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.☆1,212Updated 7 months ago
- Common interface for interacting with AI agents. The protocol is tech stack agnostic - you can use it with any framework for building age…☆997Updated 5 months ago
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆496Updated 5 months ago
- Video Search and Streaming Agent 🕵️♂️☆437Updated 9 months ago
- Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs)…☆992Updated this week
- Python & JS/TS SDK for running AI-generated code/code interpreting in your AI app☆1,265Updated this week
- Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"☆294Updated last week
- Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.☆483Updated this week
- Code for our ACL 2023 Paper "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models".☆600Updated last year
- The Open Source Memory Layer For Autonomous Agents