Halluminate / WebBenchLinks
๐ Benchmark your browser agent on ~2.5k READ and ACTION based tasks
โ71Updated 3 months ago
Alternatives and similar repositories for WebBench
Users that are interested in WebBench are comparing it to the libraries listed below
Sorting:
- A toolkit for building computer use AI agentsโ177Updated 4 months ago
- โ107Updated last week
- Routing on Random Forest (RoRF)โ218Updated last year
- ๐ค Headless IDE for AI agentsโ201Updated last month
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)โ88Updated this week
- Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a singโฆโ179Updated 7 months ago
- proof-of-concept of Cursor's Instant Apply featureโ84Updated last year
- Not Diamond Python SDKโ88Updated last week
- Efficient computer use agent powered by Meta Llama 4 Maverickโ45Updated 6 months ago
- โ142Updated 8 months ago
- Build AI Agents with Your Existing Python Code!โ67Updated last year
- A better way of testing, inspecting, and analyzing AI Agent traces.โ40Updated 2 weeks ago
- The theory of mind module for the SWE agentโ28Updated 2 weeks ago
- MarinaBox is a toolkit for creating and managing secure, isolated environments for AI agentsโ140Updated last week
- Anthropic Computer Use with Modal Sandboxesโ41Updated last year
- Letting Claude Code develop his own MCP tools :)โ123Updated 8 months ago
- Run Surfer-H agents powered by Holo1 using the Surfer-H-CLI. Includes example tasks, scripts, and configurations.โ138Updated last month
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsemblesโ59Updated 6 months ago
- Open Agent Computer Interfaceโ87Updated 11 months ago
- โ47Updated last year
- a Python library that uses Reinforcement Learning (RL) to train LLMs.โ42Updated 3 months ago
- โ89Updated 9 months ago
- โ35Updated 3 months ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through rโฆโ70Updated this week
- โ119Updated 2 weeks ago
- โ55Updated 2 months ago
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!โ130Updated last month
- An open-source debugging agent in VSCodeโ79Updated last year
- ๐ฎ๐ข The first AI voice assistant that interrupts *you*โ148Updated last year
- A framework for orchestrating AI agents using a mermaid graphโ77Updated last year