AlmogBaku / pytest-evalsLinks

A pytest plugin for running and analyzing LLM evaluation tests.

☆127

Alternatives and similar repositories for pytest-evals

Users that are interested in pytest-evals are comparing it to the libraries listed below

Sorting:

AlmogBaku / openai-streaming
Work with OpenAI's streaming API at ease with Python generators
☆121Updated last year
raptor-ml / raptor
Transform your pythonic research to an artifact that engineers can deploy easily.
☆154Updated last week
ilanbenb / wa_llm
A WhatsApp bot that can participate in group conversations, powered by AI. The bot monitors group messages and responds when mentioned.
☆78Updated last week
Optibus / playback
Record your service operations in production and replay them locally at any time in a sandbox
☆106Updated 5 months ago
klilhalahmi / SLMs-based-RAG
☆10Updated 10 months ago
christo-olivier / modelsmith
Python library that allows you to get structured responses in the form of Pydantic models and Python types from Anthropic, Google Vertex …
☆78Updated 11 months ago
SaharCarmel / TheAlmanac
A documentation assistant leveraging Model Context Protocol (MCP) to help programmers access the most up-to-date and relevant information…
☆19Updated 3 months ago
gilad-rubin / hypster
HyPSTER - HyperParameter optimization on STERoids
☆48Updated 7 months ago
aviveldan / datagov-mcp
MCP server for Israel Government Data
☆60Updated this week
avilum / yalla
A tiny LLM Agent with minimal dependencies, focused on local inference.
☆53Updated 8 months ago
kwanUm / awesome-data-quality
Curated list of tools and frameworks assisting in monitoring data quality
☆12Updated 3 years ago
superwise-ai / elemeta
Metafeature Extraction for Unstructured Data
☆102Updated 3 months ago
cyberark / agentwatch
A powerful AI observability framework that provides comprehensive insights into agent interactions across platforms, enabling developers …
☆86Updated last month
mangate / SupportAssistant
Self Support ChatBot
☆16Updated 3 months ago
langfuse / langfuse-python
🪢 Langfuse Python SDK - Instrument your LLM app with decorators or low-level SDK and get detailed tracing/observability. Works with any …
☆203Updated last week
lasso-security / mcp-gateway
A plugin-based gateway that orchestrates other MCPs and allows developers to build upon it enterprise-grade agents.
☆207Updated 2 months ago
cfahlgren1 / observers
A Lightweight Library for AI Observability
☆246Updated 4 months ago
amoffat / HeimdaLLM
Constrain LLM output
☆112Updated 11 months ago
quotient-ai / judges
A small library of LLM judges
☆216Updated last week
ContextData / VectorETL
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
☆98Updated 8 months ago
eugeneyan / align-app
☆72Updated 7 months ago
Arize-ai / openinference
OpenTelemetry Instrumentation for AI Observability
☆480Updated this week
markov-kernel / databricks-mcp
☆37Updated 2 weeks ago
pydantic / logfire-mcp
The Logfire MCP Server is here!
☆78Updated last month
tg1482 / priomptipy
A python implementation of priompt - a neat way of managing context from diverse sources for LLM applications.
☆111Updated 10 months ago
brizzai / auto-mcp
Transform any OpenAPI/Swagger definition into a fully-featured Model Context Protocol (MCP) server
☆152Updated 2 weeks ago
saharmor / voice-lab
Testing and evaluation framework for voice agents
☆124Updated 3 weeks ago
helmanofer / pydantic-prompter
A lightweight tool that lets you simply build prompts and get Pydantic objects as outputs
☆19Updated 3 weeks ago
kolenaIO / autoarena
Rank LLMs, RAG systems, and prompts using automated head-to-head evaluation
☆104Updated 6 months ago
vespperhq / vespper
Open-source AI copilot that lets you chat with your observability data and code 🧙‍♂️
☆351Updated 2 months ago