Python SDK for running evaluations on LLM generated responses
☆300Jun 6, 2025Updated 11 months ago
Alternatives and similar repositories for athina-evals
Users that are interested in athina-evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data-Driven Evaluation for LLM-Powered Applications☆516Jan 22, 2025Updated last year
- Summaries of AI Research Papers☆18Jun 29, 2024Updated last year
- 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓☆5,610Updated this week
- A tool for evaluating LLMs☆429Mar 15, 2026Updated last month
- UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured ch…☆2,346Aug 18, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Supercharge Your LLM Application Evaluations 🚀☆13,785Feb 24, 2026Updated 2 months ago
- LLM Testing SDK that helps you write and run tests to monitor your LLM app in production☆132Jan 22, 2024Updated 2 years ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆82Feb 13, 2025Updated last year
- The LLM Evaluation Framework☆15,111May 1, 2026Updated last week
- AI Observability & Evaluation☆9,523Updated this week
- Small, simple agent task environments for training and evaluation☆19Nov 1, 2024Updated last year
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.☆325Jul 10, 2025Updated 10 months ago
- Deepmark AI enables a unique testing environment for language models (LLM) assessment on task-specific metrics and on your own data so yo…☆104Nov 24, 2023Updated 2 years ago
- AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices.☆887Apr 3, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- AI Evaluation Platform☆48May 26, 2025Updated 11 months ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- structured outputs for llms☆12,889Apr 22, 2026Updated 2 weeks ago
- Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, ev…☆1,198Nov 17, 2025Updated 5 months ago
- The platform for LLM evaluations and AI agent testing☆3,240Updated this week
- 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with Open…☆26,702Updated this week
- Audio tokenization, in the fastest way possible!☆54Aug 26, 2024Updated last year
- OpenTelemetry Instrumentation for AI Observability☆949May 2, 2026Updated last week
- A blazing fast AI Gateway with integrated guardrails. Route to 1,600+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.☆11,640Mar 25, 2026Updated last month
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- An open-source visual programming environment for battle-testing prompts to LLMs.☆2,980Apr 6, 2026Updated last month
- This repository contains various advanced techniques for Retrieval-Augmented Generation (RAG) systems.☆2,510Feb 17, 2025Updated last year
- Simple AI agents / assistants☆52Oct 8, 2024Updated last year
- Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including C…☆5,512Mar 19, 2026Updated last month
- An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design☆22Dec 13, 2024Updated last year
- A super framework for prompt engineering.☆15Nov 20, 2024Updated last year
- Artificial Intelligence courses, projects, and resources☆12Nov 28, 2016Updated 9 years ago
- AI Infrastructure Engage & Think Layers for Voice & Vision Interactions☆22Jul 28, 2025Updated 9 months ago
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,171Mar 31, 2026Updated last month
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An attribution library for LLMs☆46Sep 17, 2024Updated last year
- The repository contains code for Adaptive Data Optimization☆36Dec 9, 2024Updated last year
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆45,804Updated this week
- The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.☆4,090May 3, 2026Updated last week
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆30Dec 10, 2024Updated last year
- Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Ll…☆20,923Updated this week
- LangEvals aggregates various language model evaluators into a single platform, providing a standard interface for a multitude of scores a…☆72Feb 15, 2026Updated 2 months ago