AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices.
☆811Feb 21, 2026Updated last week
Alternatives and similar repositories for autoevals
Users that are interested in autoevals are comparing it to the libraries listed below
Sorting:
- Evaluate your LLM-powered apps with TypeScript☆1,384Feb 20, 2026Updated last week
- ☆50Jan 20, 2026Updated last month
- JavaScript Tracing & Evals library for Braintrust☆116Updated this week
- The TypeScript LLM Evaluation Library☆155Nov 11, 2025Updated 3 months ago
- Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude,…☆10,691Updated this week
- Prompt design using JSX.☆2,769Oct 15, 2025Updated 4 months ago
- A vitest extension for running evals.☆127Jan 23, 2026Updated last month
- ☆383Updated this week
- structured outputs for llms☆12,428Updated this week
- The LLM Evaluation Framework☆13,787Updated this week
- Evals meant to evaluate language models' ability to reason over long contexts.☆10Sep 12, 2024Updated last year
- Laminar - open-source observability platform purpose-built for AI agents. YC S24.☆2,619Updated this week
- 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with Open…☆22,415Updated this week
- AI Observability & Evaluation☆8,666Updated this week
- 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓☆5,156Updated this week
- From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.☆21,376Updated this week
- The pretty much "official" DSPy framework for Typescript☆2,434Updated this week
- The AI Browser Automation Framework☆21,261Updated this week
- The platform for LLM evaluations and AI agent testing☆2,837Updated this week
- DSPy: The framework for programming—not prompting—language models☆32,381Updated this week
- A Workers AI provider for the vercel AI SDK☆115Mar 18, 2025Updated 11 months ago
- Python SDK for running evaluations on LLM generated responses☆298Jun 6, 2025Updated 8 months ago
- Run AI workflows with TypeScript & Vercel AI SDK☆260Jul 5, 2025Updated 7 months ago
- OTEL ingestion running on Cloudflare Workers☆49Apr 8, 2025Updated 10 months ago
- The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go compatible)☆7,655Updated this week
- Structured Outputs☆13,456Feb 13, 2026Updated 2 weeks ago
- Developer toolkit that makes it simple to build with the Workers AI platform.☆182Oct 1, 2024Updated last year
- The leading workflow orchestration platform. Run stateful step functions and AI workflows on serverless, servers, or the edge.☆4,927Updated this week
- an ambient intelligence library☆6,083Feb 19, 2026Updated last week
- Supercharge Your LLM Application Evaluations 🚀☆12,736Updated this week
- Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.☆17,889Nov 3, 2025Updated 3 months ago
- AI Hero's open-source examples and course material. Learn AI Engineering with a single repo.☆1,333Jul 22, 2025Updated 7 months ago
- A lightweight React Hook intended mainly for AI chat applications, for smoothly sticking to bottom of messages☆684Feb 6, 2026Updated 3 weeks ago
- The AI Toolkit for TypeScript. From the creators of Next.js, the AI SDK is a free open-source library for building AI-powered application…☆21,971Updated this week
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆37,083Updated this week
- Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, ev…☆1,183Nov 17, 2025Updated 3 months ago
- Typescript/React Library for AI Chat💬🚀☆8,632Updated this week
- Python & JS/TS SDK for running AI-generated code/code interpreting in your AI app☆2,220Updated this week
- Embeddable Postgres with real-time, reactive bindings.☆14,778Updated this week