browser-use / evalLinks
☆42Updated last year
Alternatives and similar repositories for eval
Users that are interested in eval are comparing it to the libraries listed below
Sorting:
- Voyage AI Official Python Library☆91Updated last week
- ☆33Updated 2 years ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆115Updated 9 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆90Updated last month
- A better way of testing, inspecting, and analyzing AI Agent traces.☆46Updated 3 weeks ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆53Updated 6 months ago
- proof-of-concept of Cursor's Instant Apply feature☆88Updated last year
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆44Updated last year
- ☆18Updated last year
- Official Repo for CRMArena and CRMArena-Pro☆132Updated this week
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆60Updated 11 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆99Updated 4 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆79Updated last year
- Routing on Random Forest (RoRF)☆239Updated last year
- A toolkit for building computer use AI agents☆182Updated 7 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆151Updated last year
- ☆87Updated last year
- DSPY on action with OpenSource LLMs.☆103Updated last year
- Code interpreter support for o1☆31Updated last year
- Using modal.com to process FineWeb-edu data☆20Updated 10 months ago
- A Ruby on Rails style framework for the DSPy (Demonstrate, Search, Predict) project for Language Models like GPT, BERT, and LLama.☆132Updated last year
- Natural Language Interfaces Powered by LLMs☆95Updated last year
- Simple examples using Argilla tools to build AI☆57Updated last year
- Chrome Extension for exploring Hugging Face datasets 🔎☆48Updated last year
- ☆30Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated last year
- Query language for blending SQL and local language models across structured + unstructured data, with type constraints.☆159Updated this week
- Training setup for Langchain's Open Deep Research☆74Updated 5 months ago
- 📚 Benchmark your browser agent on ~2.5k READ and ACTION based tasks☆85Updated 6 months ago
- Anthropic Computer Use with Modal Sandboxes☆43Updated last year