browser-use / evalLinks
☆42Updated last year
Alternatives and similar repositories for eval
Users that are interested in eval are comparing it to the libraries listed below
Sorting:
- Voyage AI Official Python Library☆91Updated this week
- ☆33Updated 2 years ago
- proof-of-concept of Cursor's Instant Apply feature☆88Updated last year
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆114Updated 9 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated last year
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆90Updated last month
- ☆87Updated last year
- Simple examples using Argilla tools to build AI☆57Updated last year
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆135Updated 4 months ago
- Anthropic Computer Use with Modal Sandboxes☆42Updated last year
- Routing on Random Forest (RoRF)☆239Updated last year
- Nexusflow function call, tool use, and agent benchmarks.☆30Updated last year
- Code interpreter support for o1☆31Updated last year
- Solving data for LLMs - Create quality synthetic datasets!☆151Updated last year
- ☆18Updated last year
- Open Agent Computer Interface☆92Updated last year
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆60Updated 11 months ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆53Updated 6 months ago
- Welcome to ResearchAgent ! A personal research assistant powered by GPT-3.5/GPT-4. You can ask follow up questions. Get source details o…☆36Updated 2 years ago
- ☆40Updated 8 months ago
- Official Repo for CRMArena and CRMArena-Pro☆133Updated 2 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆99Updated 3 months ago
- Simple Graph Memory for AI applications☆90Updated 8 months ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆26Updated 10 months ago
- Voice agent using LiveKit (orchestration), Cartesia (TTS), OpenAI (LLM), and Deepgram (STT)☆20Updated 3 months ago
- Own your AI, search the web with it🌐😎☆94Updated last year
- Harness used to benchmark aider against SWE Bench benchmarks☆78Updated last year
- ☆132Updated last month
- Make DSPy Agentic using protocol-first approach that support the Agent Protocols like MCP, A2A☆66Updated 8 months ago
- ☆57Updated last week