braintrustdata / autoevals
AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices.
☆473Updated this week
Alternatives and similar repositories for autoevals:
Users that are interested in autoevals are comparing it to the libraries listed below
- structured extraction for llms☆711Updated 3 months ago
- ☆149Updated 3 months ago
- Fully typed & consistent chat APIs for OpenAI, Anthropic, Groq, and Azure's chat models for browser, edge, and node environments.☆170Updated 11 months ago
- Low latency JSON generation using LLMs ⚡️☆400Updated last year
- Prompt engineering, automated.☆308Updated 2 weeks ago
- Create state-machine-powered LLM agents using XState☆283Updated 2 weeks ago
- Python SDK for running evaluations on LLM generated responses☆278Updated last week
- ☆32Updated last month
- ☆120Updated this week
- 🛠️ The toolkit for codebase mapping, symbol extraction, and many kinds of code search. Build AI-powered devtools!☆238Updated this week
- ☆195Updated last year
- Data-Driven Evaluation for LLM-Powered Applications☆492Updated 3 months ago
- Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one simple API.☆372Updated last year
- Sister project to OpenLLMetry, but in Typescript. Open-source observability for your LLM application, based on OpenTelemetry☆313Updated last week
- Simple AI coder that can do most of my work for me, including working on himself.☆235Updated last month
- Create repos and commits with AI.☆293Updated last year
- smol-podcaster is your podcast production agent 🎙️☆342Updated 7 months ago
- Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, ba…☆239Updated this week
- A fuzzy key value store based on semantic similarity rather lexical equality.☆273Updated 5 months ago
- Natural language search for complex JSON arrays, with AI Quickstart.☆396Updated 11 months ago
- Logging and caching superpowers for the openai sdk☆105Updated last year
- Build hours code to share.☆226Updated 4 months ago
- llm-consortium orchestrates mulitple LLMs, iteratively refines & achieves consensus.☆248Updated last week
- Automatically reformat any JSON into any schema with AI☆328Updated last month
- Verdict is a library for scaling judge-time compute.☆209Updated last week
- ☆60Updated this week
- ☆401Updated 8 months ago
- The "official" unofficial DSPy framework. Build LLM powered agents and other workflows, based on the Stanford DSP paper.☆1,459Updated last week
- AgentKit: Build multi-agent networks in TypeScript with deterministic routing and rich tooling via MCP.☆433Updated 2 weeks ago
- Readymade evaluators for your LLM apps☆371Updated this week