Magnitude achieves SOTA 94% on WebVoyager benchmark
☆37Jul 7, 2025Updated 8 months ago
Alternatives and similar repositories for webvoyager
Users that are interested in webvoyager are comparing it to the libraries listed below
Sorting:
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- ☆19Jun 6, 2025Updated 9 months ago
- Generate Python docstrings automatically with LLM and syntax trees☆20Jun 13, 2025Updated 9 months ago
- Mimetics determines the file type, MIME type, and media type of a given file using magic numbers and content analysis to detect the most …☆10Aug 7, 2025Updated 7 months ago
- ☆14Mar 21, 2025Updated 11 months ago
- Run LLMs on Replicate with vLLM☆26Jul 19, 2025Updated 8 months ago
- The Onchain AI Oracle Intents Engine (IE): A Basic Text-to-tx Simulator Contract based on OAO.☆16Feb 17, 2024Updated 2 years ago
- Create and manage isolated Git worktrees for AI coding agents.☆27Mar 3, 2026Updated 2 weeks ago
- Example agents I've built using the LiveKit Agents (https://github.com/livekit/agents) framework☆20May 11, 2024Updated last year
- A curated list of projects and resources using BAML☆17Aug 1, 2025Updated 7 months ago
- NeurIPS 2024: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation☆13May 24, 2025Updated 9 months ago
- Run evals using LLM☆27Jan 8, 2026Updated 2 months ago
- Official repository of the paper MPMQA: Multimodal Question Answering on Product Manuals (AAAI 2023)☆19Nov 28, 2022Updated 3 years ago
- ☆14Feb 24, 2023Updated 3 years ago
- A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)☆24May 28, 2025Updated 9 months ago
- ☆14Mar 27, 2024Updated last year
- ☆14Apr 16, 2024Updated last year
- ☆22Jul 22, 2025Updated 7 months ago
- Chat with Uniswap v3 using natural language, powered by OpenAI Functions☆12Oct 30, 2023Updated 2 years ago
- Tree-based indexes for neural-search☆31Mar 4, 2024Updated 2 years ago
- ☆15Mar 12, 2026Updated last week
- An agent implemented using BAML and LangGraph to do a deep research on questions and generate cited answers.☆21May 4, 2025Updated 10 months ago
- ☆18Mar 26, 2025Updated 11 months ago
- Enterprise-grade Rust implementation of Anthropic's MCP protocol☆43Feb 25, 2026Updated 3 weeks ago
- Business Data Benchmark (BDB) is a set of real-world questions to evaluate AI systems connected to business data.☆24Dec 3, 2024Updated last year
- mcp wrapper for openai built-in tools☆12Mar 13, 2025Updated last year
- A very simple cross-service LLM API for Python☆23Nov 30, 2023Updated 2 years ago
- ☆25May 28, 2025Updated 9 months ago
- A collection of network-related python utilities.☆17Sep 8, 2023Updated 2 years ago
- ☆20Feb 10, 2025Updated last year
- ☆17Jun 7, 2024Updated last year
- Move data between environments using Dataplex☆15Feb 24, 2026Updated 3 weeks ago
- Simple, Non authoritative Benchmarks for embedded databases running in Github Actions☆10Jul 11, 2024Updated last year
- VS Code extension for editing Accord Project artifacts☆15Feb 24, 2023Updated 3 years ago
- Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation, ECCV 2024☆22Feb 15, 2024Updated 2 years ago
- Code for the paper: Prompts have evil twins (EMNLP 2024)☆23Feb 10, 2025Updated last year
- Gherkin DSL for Ginkgo☆11Nov 15, 2023Updated 2 years ago
- An implemention of GraphRAG using open source small LLMs☆14Nov 9, 2024Updated last year
- Auto-generate Next.js 14 UI from your Prisma Schema in seconds☆22Updated this week