Python SDK for running evaluations on LLM generated responses
β299Jun 6, 2025Updated 10 months ago
Alternatives and similar repositories for athina-evals
Users that are interested in athina-evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data-Driven Evaluation for LLM-Powered Applicationsβ516Jan 22, 2025Updated last year
- π§ Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 πβ5,497Apr 11, 2026Updated last week
- A tool for evaluating LLMsβ428Mar 15, 2026Updated last month
- UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured chβ¦β2,343Aug 18, 2024Updated last year
- Supercharge Your LLM Application Evaluations πβ13,415Feb 24, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- LLM Testing SDK that helps you write and run tests to monitor your LLM app in productionβ132Jan 22, 2024Updated 2 years ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)β82Feb 13, 2025Updated last year
- The LLM Evaluation Frameworkβ14,728Apr 9, 2026Updated last week
- AI Observability & Evaluationβ9,284Updated this week
- Small, simple agent task environments for training and evaluationβ19Nov 1, 2024Updated last year
- β20Jul 19, 2023Updated 2 years ago
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.β325Jul 10, 2025Updated 9 months ago
- Deepmark AI enables a unique testing environment for language models (LLM) assessment on task-specific metrics and on your own data so yoβ¦β104Nov 24, 2023Updated 2 years ago
- Luna is inspired by Lucid, a framework for Feature Visualization. However, Luna is built on Tensorflow 2, and thus supports modern modelsβ¦β11Aug 17, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices.β861Apr 3, 2026Updated 2 weeks ago
- Clean and functional LLM frontendβ11Mar 7, 2025Updated last year
- AI Evaluation Platformβ48May 26, 2025Updated 10 months ago
- REST API for Large Language Models using FastAPI, Redis and LiteLLMβ14Nov 30, 2023Updated 2 years ago
- LLM evaluation.β16Nov 7, 2023Updated 2 years ago
- structured outputs for llmsβ12,749Updated this week
- Langtrace π is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evβ¦β1,189Nov 17, 2025Updated 5 months ago
- The platform for LLM evaluations and AI agent testingβ3,206Updated this week
- πͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with Openβ¦β25,055Updated this week
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Audio tokenization, in the fastest way possible!β54Aug 26, 2024Updated last year
- OpenTelemetry Instrumentation for AI Observabilityβ918Updated this week
- A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.β11,358Mar 25, 2026Updated 3 weeks ago
- An open-source visual programming environment for battle-testing prompts to LLMs.β2,971Apr 6, 2026Updated last week
- This repository contains various advanced techniques for Retrieval-Augmented Generation (RAG) systems.β2,488Feb 17, 2025Updated last year
- Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including Cβ¦β5,464Mar 19, 2026Updated last month
- An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-designβ22Dec 13, 2024Updated last year
- A super framework for prompt engineering.β15Nov 20, 2024Updated last year
- Artificial Intelligence courses, projects, and resourcesβ12Nov 28, 2016Updated 9 years ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- AI Infrastructure Engage & Think Layers for Voice & Vision Interactionsβ22Jul 28, 2025Updated 8 months ago
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.β3,165Mar 31, 2026Updated 2 weeks ago
- An attribution library for LLMsβ46Sep 17, 2024Updated last year
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing aβ¦β43,478Updated this week
- The repository contains code for Adaptive Data Optimizationβ35Dec 9, 2024Updated last year
- The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.β4,021Updated this week
- β29May 30, 2023Updated 2 years ago