browser-use / evalLinks
☆42Updated last year
Alternatives and similar repositories for eval
Users that are interested in eval are comparing it to the libraries listed below
Sorting:
- ☆33Updated 2 years ago
- DSPY on action with OpenSource LLMs.☆103Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆99Updated 4 months ago
- Query language for blending SQL and local language models across structured + unstructured data, with type constraints.☆159Updated this week
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆90Updated last month
- proof-of-concept of Cursor's Instant Apply feature☆88Updated last year
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆44Updated last year
- Natural Language Interfaces Powered by LLMs☆95Updated last year
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆115Updated 9 months ago
- Voyage AI Official Python Library☆91Updated last week
- A better way of testing, inspecting, and analyzing AI Agent traces.☆46Updated 3 weeks ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated last year
- Verbosity control for AI agents☆66Updated last year
- Chrome Extension for exploring Hugging Face datasets 🔎☆48Updated last year
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆73Updated 3 months ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆53Updated 6 months ago
- Simple Graph Memory for AI applications☆90Updated 8 months ago
- Code interpreter support for o1☆31Updated last year
- Embedding models from Jina AI☆65Updated 2 years ago
- Demo of knowledge graph creation and Graph RAG with BAML and Kuzu☆73Updated 4 months ago
- A collection of Compound Retrieval Systems implemented with DSPy and Weaviate.☆94Updated 3 weeks ago
- ☆57Updated 2 weeks ago
- ☆87Updated last year
- Routing on Random Forest (RoRF)☆239Updated last year
- Harness used to benchmark aider against SWE Bench benchmarks☆79Updated last year
- Data Questionnaire Agent Chatbot☆71Updated this week
- Not Diamond Python SDK☆90Updated last month
- Anthropic Computer Use with Modal Sandboxes☆43Updated last year
- Solving data for LLMs - Create quality synthetic datasets!☆151Updated last year
- ☆133Updated last month