ππ§ GPT-4 Vision x πͺβ¨οΈ Vimium = Autonomous Web Agent
β168Nov 16, 2023Updated 2 years ago
Alternatives and similar repositories for GPT-V-on-Web
Users that are interested in GPT-V-on-Web are comparing it to the libraries listed below
Sorting:
- AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UIβ1,065Dec 9, 2024Updated last year
- Browse the web with GPT-4V and Vimiumβ2,667Sep 25, 2024Updated last year
- A CLI speech recognition tool, using OpenAI Whisper, supports audio file transcription and near-realtime microphone input.β22Mar 1, 2026Updated last week
- A web-based tool that utilizes GPT-4's vision capabilities to analyze and describe system architecture diagrams, providing instant insighβ¦β16Nov 9, 2023Updated 2 years ago
- Web Scraping with GPT-4 Vision API and Puppeteerβ563Jan 31, 2024Updated 2 years ago
- Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"β42Nov 11, 2024Updated last year
- Website for the Open Interpreter projectβ32Mar 22, 2024Updated last year
- Interact privately with your documents using the power of GPT, 100% privately, no data leaksβ10May 22, 2023Updated 2 years ago
- GPT-4V in Wonderland: LMMs as Smartphone Agentsβ134Jul 17, 2024Updated last year
- A chat implementation for FastHTMLβ11Sep 14, 2025Updated 5 months ago
- This AI Agent retrieves the latest news articles based on a multi keyword using the Serp API. It processes the results and returns structβ¦β11Jan 31, 2025Updated last year
- Contains the model patches and the eval logs from the passing swe-bench-lite run.β10Jun 28, 2024Updated last year
- Globot is an agent that controls your browser using playwright and GPT-4V.β134Jan 4, 2024Updated 2 years ago
- A simple "Be My Eyes" web app with a llama.cpp/llava backendβ493Nov 28, 2023Updated 2 years ago
- Local & private voice controlled notepad using whisper.cppβ26Jan 21, 2024Updated 2 years ago
- β18Apr 3, 2023Updated 2 years ago
- An unofficial implementation of Tensor4D with support for the D-NeRF datasetβ13Nov 8, 2023Updated 2 years ago
- stream-of-consciousness experience of an AI's thinking process, complete with creative tangents and unexpected connections.β14Jan 29, 2025Updated last year
- Various agents from all of the top agent frameworks to integrate into swarms! Langchain, Griptape, CrewAI, and more!β18Dec 22, 2025Updated 2 months ago
- Build modern UIs in Jupyter with Pythonβ12Dec 28, 2022Updated 3 years ago
- Spacedrive native dependenciesβ13Apr 8, 2025Updated 11 months ago
- Trading strategy based on the Maximum Pain Theoryβ10Aug 27, 2016Updated 9 years ago
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multβ¦β830Feb 3, 2025Updated last year
- β14Feb 17, 2024Updated 2 years ago
- Anthropic MCP client for macOSβ16Jan 5, 2025Updated last year
- The decentralized storage application for accelerating AI innovationβ17Apr 9, 2024Updated last year
- Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Modelsβ15Jan 3, 2024Updated 2 years ago
- Allows dictating anywhere in Windows using AutoHotKey and OpenAI's Whisper speech-to-text engine.β13Feb 21, 2024Updated 2 years ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generationβ144Oct 30, 2024Updated last year
- [NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" -- the first LLM-based web agent and benchmark for generalist wβ¦β952Nov 5, 2025Updated 4 months ago
- β63Sep 23, 2024Updated last year
- A2A MCP Server is a lightweight Python bridge that lets Claude Desktop or any MCP client talk to A2A agents. It provides three tools: regβ¦β21May 4, 2025Updated 10 months ago
- Allows issuing voice commands in Windows via AutoHotKey scripts generated by ChatGPT.β14Jan 5, 2025Updated last year
- Markdown to ANSII in TypeScript based on Micro-Mark, with support for URLs, tables, lists and more.β39Feb 28, 2026Updated last week
- Manage your ever-growing list of research papersβ13Nov 19, 2023Updated 2 years ago
- End-to-end deployment of Azure OpenAI applications using Redis Enterprise as a vector database.β16Sep 18, 2023Updated 2 years ago
- β15Jul 9, 2025Updated 8 months ago
- A collection of utilities for FastHTML projects.β14Oct 23, 2024Updated last year
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't relβ¦β12Jan 29, 2024Updated 2 years ago