sagekit / webvoyagerLinks

Magnitude achieves SOTA 94% on WebVoyager benchmark

☆29

Alternatives and similar repositories for webvoyager

Users that are interested in webvoyager are comparing it to the libraries listed below

Sorting:

Helicone / ai-gateway
The fastest, lightest, and easiest-to-integrate AI gateway on the market. Fully open-sourced.
☆462Updated 3 months ago
zenbase-ai / core
Prompt engineering, automated.
☆348Updated 7 months ago
e2b-dev / infra
Infrastructure that's powering E2B Cloud.
☆727Updated this week
agentic-labs / lsproxy
Multi-language code navigation API in a container
☆95Updated 3 months ago
abshkbh / arrakis
A fully customizable and self-hosted sandboxing solution for AI agent code execution and computer use. It features out-of-the-box support…
☆660Updated 5 months ago
cohere-ai / cohere-terrarium
A simple Python sandbox for helpful LLM data agents
☆294Updated last year
athina-ai / athina-evals
Python SDK for running evaluations on LLM generated responses
☆293Updated 5 months ago
hide-org / hide
🤖 Headless IDE for AI agents
☆200Updated last month
567-labs / kura
Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…
☆366Updated 2 months ago
haizelabs / verdict
Inference-time scaling for LLMs-as-a-judge.
☆310Updated 2 weeks ago
codestoryai / sidecar
Sidecar is the AI brains for the Aide editor and works alongside it, locally on your machine
☆589Updated 6 months ago
reworkd / bananalyzer
Open source AI Agent evaluation framework for web tasks 🐒🍌
☆312Updated 10 months ago
humanlayer / agentcontrolplane
ACP is the Agent Control Plane - a distributed agent scheduler optimized for simplicity, clarity, and control. It is designed for outer-l…
☆246Updated 4 months ago
e2b-dev / mcp-server
Giving Claude ability to run code with E2B via MCP (Model Context Protocol)
☆349Updated 2 weeks ago
plastic-labs / honcho
Memory Library for Building Agents with Social Intelligence
☆243Updated this week
Not-Diamond / RoRF
Routing on Random Forest (RoRF)
☆219Updated last year
Halluminate / WebBench
📚 Benchmark your browser agent on ~2.5k READ and ACTION based tasks
☆73Updated 3 months ago
Scale3-Labs / dspyground
A tool kit for generating high quality prompts using DSPy GEPA optimizer
☆285Updated last month
defog-ai / introspect
Deep Research for your internal data
☆348Updated 5 months ago
agentsea / surfkit
A toolkit for building computer use AI agents
☆178Updated 4 months ago
braintrustdata / autoevals
AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices.
☆722Updated this week
Mirascope / lilypad
Open-source versioning, tracing, and annotation tooling.
☆204Updated 2 weeks ago
pig-dot-dev / piglet
☆142Updated 8 months ago
groq / openbench
Provider-agnostic, open-source evaluation infrastructure for language models
☆653Updated last week
langchain-ai / agent-protocol
☆478Updated this week
hdresearch / nolita
Work with web-enabled agents quickly — whether running a quick task or bootstrapping a full-stack product.
☆93Updated last year
willccbb / mcp-client-server
An MCP Server that's also an MCP Client. Useful for letting Claude develop and test MCPs without needing to reset the application.
☆124Updated 8 months ago
tjmlabs / AgentRun
The easiest, and fastest way to run AI-generated Python code safely
☆339Updated 11 months ago
TwillAI / summon-app
Postman for MCP servers
☆123Updated 3 months ago
saharmor / voice-lab
Testing and evaluation framework for voice agents
☆154Updated 5 months ago