browser-use / evalLinks
☆39Updated 7 months ago
Alternatives and similar repositories for eval
Users that are interested in eval are comparing it to the libraries listed below
Sorting:
- ☆33Updated last year
- Voyage AI Official Python Library☆71Updated last month
- Natural Language Interfaces Powered by LLMs☆94Updated last year
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆84Updated 5 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆111Updated 4 months ago
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆37Updated last year
- proof-of-concept of Cursor's Instant Apply feature☆83Updated last year
- A daemon that makes a desktop OS accessible to AI agents☆33Updated 3 months ago
- ☆40Updated 3 months ago
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. I…☆114Updated last month
- Official Repo for CRMArena and CRMArena-Pro☆110Updated 2 months ago
- ☆47Updated last year
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆68Updated last year
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆46Updated last year
- Reactive DDD with DSPy☆22Updated last year
- Simple examples using Argilla tools to build AI☆55Updated 9 months ago
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆107Updated last month
- Simple Graph Memory for AI applications☆90Updated 3 months ago
- Not financial advice.☆28Updated 2 years ago
- Anthropic Computer Use with Modal Sandboxes☆37Updated 10 months ago
- ☆105Updated 2 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 4 months ago
- Aider's refactoring benchmark exercises based on popular python repos☆77Updated 10 months ago
- Embedding models from Jina AI☆64Updated last year
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆78Updated 6 months ago
- Data Questionnaire Agent Chatbot☆68Updated 3 months ago
- A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images☆40Updated last year
- Apps that run on modal.com☆12Updated last month
- ☆123Updated last year
- ☆57Updated last year