ddupont808 / GPT-4V-Act
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
☆1,034Updated 4 months ago
Alternatives and similar repositories for GPT-4V-Act:
Users that are interested in GPT-4V-Act are comparing it to the libraries listed below
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆737Updated 2 months ago
- [NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"☆816Updated 2 weeks ago
- Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"☆970Updated 2 months ago
- Set-of-Mark Prompting for GPT-4V and LMMs☆1,357Updated 8 months ago
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,643Updated 7 months ago
- [IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.☆1,354Updated last year
- ☆1,029Updated last year
- Create browser automation as if you were teaching a human using GPT-4 Vision.☆580Updated last year
- Agent driven automation starting with the web. Try it: https://www.emergence.ai/web-automation-api☆1,089Updated 2 months ago
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments☆1,778Updated this week
- 🌎💪 BrowserGym, a Gym environment for web task automation