Halluminate / WebBenchLinks
📚 Benchmark your browser agent on ~2.5k READ and ACTION based tasks
☆77Updated 4 months ago
Alternatives and similar repositories for WebBench
Users that are interested in WebBench are comparing it to the libraries listed below
Sorting:
- Routing on Random Forest (RoRF)☆226Updated last year
- 🤖 Headless IDE for AI agents☆200Updated last month
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆89Updated this week
- ☆107Updated last month
- [NAACL2025] LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications☆131Updated 4 months ago
- ☆84Updated last year
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆60Updated 6 months ago
- Agent computer interface for AI software engineer.☆114Updated 2 months ago
- OSS RL environment + evals toolkit☆203Updated this week
- Deprecated Browserbase Python SDK☆11Updated last year
- ☆83Updated 3 months ago
- Challenges for general-purpose web-browsing AI agents☆67Updated 6 months ago
- ☆140Updated 9 months ago
- A toolkit for building computer use AI agents☆179Updated 5 months ago
- 🦾💻🌐 distributed training & serverless inference at scale on RunPod☆18Updated last year
- The theory of mind module for the SWE agent☆44Updated last week
- An open-source debugging agent in VSCode☆79Updated last year
- MarinaBox is a toolkit for creating and managing secure, isolated environments for AI agents☆140Updated last week
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆220Updated 2 weeks ago
- An Automatic Prompt Optimization Framework for Large Language Models☆138Updated 4 months ago
- ☆89Updated 10 months ago
- Open Agent Computer Interface☆89Updated last year
- Not Diamond Python SDK☆89Updated 2 weeks ago
- Letting Claude Code develop his own MCP tools :)☆123Updated 8 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 10 months ago
- A mcp server that uses the Osmosis-Apply-1.7B model to apply code merges☆53Updated 5 months ago
- Rank LLMs, RAG systems, and prompts using automated head-to-head evaluation☆107Updated 11 months ago
- ☆123Updated 2 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆225Updated this week
- Metadspy: The framework for specifying—not programming—language models☆88Updated 5 months ago