Python SDK for running evaluations on LLM generated responses
β298Jun 6, 2025Updated 9 months ago
Alternatives and similar repositories for athina-evals
Users that are interested in athina-evals are comparing it to the libraries listed below
Sorting:
- Data-Driven Evaluation for LLM-Powered Applicationsβ515Jan 22, 2025Updated last year
- π§ Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 πβ5,197Updated this week
- UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured chβ¦β2,336Aug 18, 2024Updated last year
- A tool for evaluating LLMsβ428May 10, 2024Updated last year
- AI Observability & Evaluationβ8,746Updated this week
- Small, simple agent task environments for training and evaluationβ19Nov 1, 2024Updated last year
- The LLM Evaluation Frameworkβ13,984Updated this week
- Supercharge Your LLM Application Evaluations πβ12,826Feb 24, 2026Updated last week
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.β324Jul 10, 2025Updated 7 months ago
- The platform for LLM evaluations and AI agent testingβ3,077Updated this week
- A realtime serving engine for Data-Intensive Generative AI Applicationsβ1,343Updated this week
- structured outputs for llmsβ12,468Feb 25, 2026Updated last week
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)β82Feb 13, 2025Updated last year
- Langtrace π is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evβ¦β1,185Nov 17, 2025Updated 3 months ago
- An attribution library for LLMsβ46Sep 17, 2024Updated last year
- Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including Cβ¦β5,333Oct 30, 2025Updated 4 months ago
- πͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with Openβ¦β22,717Updated this week
- An open-source visual programming environment for battle-testing prompts to LLMs.β2,954Jan 2, 2026Updated 2 months ago
- Get 100% uptime, reliability from OpenAI. Handle Rate Limit, Timeout, API, Keys Errorsβ698Nov 20, 2023Updated 2 years ago
- Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude,β¦β10,821Updated this week
- OpenTelemetry Instrumentation for AI Observabilityβ871Updated this week
- AI Evaluation Platformβ48May 26, 2025Updated 9 months ago
- A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.β10,807Updated this week
- Deepmark AI enables a unique testing environment for language models (LLM) assessment on task-specific metrics and on your own data so yoβ¦β104Nov 24, 2023Updated 2 years ago
- [ACL 2024] A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Datasetβ25May 29, 2025Updated 9 months ago
- fork of litellm that is open sourceβ22Jan 22, 2026Updated last month
- LLM evaluation.β16Nov 7, 2023Updated 2 years ago
- Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Managemeβ¦β2,261Updated this week
- The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.β3,887Updated this week
- AI-powered tools to automate code documentation and optimize developer operations.β42Feb 9, 2026Updated last month
- Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroβ¦β3,022Feb 11, 2026Updated 3 weeks ago
- πΉοΈ Open-source, developer-first LLMOps platform designed to streamline prompt design, version management, instant delivery, collaboratioβ¦β3,194Jun 28, 2025Updated 8 months ago
- Vision utilities for web interaction agents πβ1,755Nov 25, 2024Updated last year
- Superagent protects your AI applications against prompt injections, data leaks, and harmful outputs. Embed safety directly into your app β¦β6,449Feb 3, 2026Updated last month
- This repository contains various advanced techniques for Retrieval-Augmented Generation (RAG) systems.β2,465Feb 17, 2025Updated last year
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing aβ¦β37,994Updated this week
- LLM Testing SDK that helps you write and run tests to monitor your LLM app in productionβ132Jan 22, 2024Updated 2 years ago
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.β7,720Nov 7, 2025Updated 4 months ago
- Laminar - open-source observability platform purpose-built for AI agents. YC S24.β2,662Updated this week