Python SDK for running evaluations on LLM generated responses
β300Jun 6, 2025Updated 11 months ago
Alternatives and similar repositories for athina-evals
Users that are interested in athina-evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data-Driven Evaluation for LLM-Powered Applicationsβ516Jan 22, 2025Updated last year
- π§ Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 πβ5,752May 18, 2026Updated last week
- A tool for evaluating LLMsβ428Mar 15, 2026Updated 2 months ago
- UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured chβ¦β2,350Aug 18, 2024Updated last year
- Supercharge Your LLM Application Evaluations πβ14,123Feb 24, 2026Updated 3 months ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- LLM Testing SDK that helps you write and run tests to monitor your LLM app in productionβ132Jan 22, 2024Updated 2 years ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)β82Feb 13, 2025Updated last year
- The LLM Evaluation Frameworkβ15,681Updated this week
- Small, simple agent task environments for training and evaluationβ19Nov 1, 2024Updated last year
- AI Observability & Evaluationβ9,859Updated this week
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.β325Jul 10, 2025Updated 10 months ago
- Deepmark AI enables a unique testing environment for language models (LLM) assessment on task-specific metrics and on your own data so yoβ¦β104Nov 24, 2023Updated 2 years ago
- AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices.β908Apr 3, 2026Updated last month
- AI Evaluation Platformβ49May 26, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- REST API for Large Language Models using FastAPI, Redis and LiteLLMβ14Nov 30, 2023Updated 2 years ago
- LLM evaluation.β16Nov 7, 2023Updated 2 years ago
- structured outputs for llmsβ13,023May 24, 2026Updated last week
- Langtrace π is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evβ¦β1,203Nov 17, 2025Updated 6 months ago
- The platform for LLM evaluations and AI agent testingβ3,274Updated this week
- πͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with Openβ¦β28,205Updated this week
- Audio tokenization, in the fastest way possible!β54Aug 26, 2024Updated last year
- OpenTelemetry Instrumentation for AI Observabilityβ990May 25, 2026Updated last week
- A blazing fast AI Gateway with integrated guardrails. Route to 1,600+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.β11,912Updated this week
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- An open-source visual programming environment for battle-testing prompts to LLMs.β2,987Apr 6, 2026Updated last month
- This repository contains various advanced techniques for Retrieval-Augmented Generation (RAG) systems.β2,527Feb 17, 2025Updated last year
- Simple AI agents / assistantsβ52Oct 8, 2024Updated last year
- A super framework for prompt engineering.β15Nov 20, 2024Updated last year
- An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-designβ22Dec 13, 2024Updated last year
- Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including Cβ¦β5,579Mar 19, 2026Updated 2 months ago
- Artificial Intelligence courses, projects, and resourcesβ12Nov 28, 2016Updated 9 years ago
- AI Infrastructure Engage & Think Layers for Voice & Vision Interactionsβ22Jul 28, 2025Updated 10 months ago
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.β3,180Mar 31, 2026Updated 2 months ago
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- An attribution library for LLMsβ46Sep 17, 2024Updated last year
- The repository contains code for Adaptive Data Optimizationβ36Dec 9, 2024Updated last year
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing aβ¦β48,644Updated this week
- β29May 30, 2023Updated 3 years ago
- The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.β4,155Updated this week
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding forβ¦β30Dec 10, 2024Updated last year
- Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Deβ¦β21,715Updated this week