JudgmentLabs / judgeval
Rigorously test, monitor, and optimize agent systems—grounded in research from Stanford and Berkeley AI Labs.
☆54Updated this week
Alternatives and similar repositories for judgeval
Users that are interested in judgeval are comparing it to the libraries listed below
Sorting:
- A fork of Anthropic Computer Use that you can run on Mac computers to give Claude and other AI models autonomous access to your computer.☆796Updated 5 months ago
- A1Base NextJS template☆60Updated this week
- Optimizing inference proxy for LLMs☆2,220Updated this week
- Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into…☆3,660Updated this week
- Lychee Spark Debugger☆57Updated this week
- AI Agents for Enterprise Software Automation☆43Updated 4 months ago
- Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including O…☆4,382Updated this week
- DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing. Now expanding into…☆1,296Updated 2 weeks ago
- A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions☆1,040Updated 2 weeks ago
- WebRover is an autonomous AI agent designed to interpret user input and execute actions by interacting with web elements to accomplish ta…☆917Updated 3 months ago
- A Python package that makes it easy for developers to create AI apps powered by various AI providers.☆1,605Updated last month
- 🐍 Sublingual helps you log and analyze all of your LLM calls, including the prompt template, call parameters, responses, tool calls, and…☆51Updated 2 months ago
- Codebase and CLI for PLAPT: A state-of-the-art protein-ligand binding affinity model for drug discovery☆96Updated last month
- The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go compatible)☆3,622Updated this week
- This repository provides an advanced Retrieval-Augmented Generation (RAG) solution for complex question answering. It uses sophisticated …☆1,206Updated this week
- Parlant is the open-source engine for controlled, compliant, and purposeful generative AI conversations. It gives you the power of LLMs w…☆2,728Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,935Updated 9 months ago
- AdalFlow: The library to build & auto-optimize LLM applications.☆3,031Updated last month
- ☆58Updated 2 months ago
- Perplexity powered AI assistant for time based trivia games☆148Updated 2 months ago
- A system for agentic LLM-powered data processing and ETL☆1,947Updated this week
- HumanLayer enables AI agents to communicate with humans in tool-based and async workflows. Guarantee human oversight of high-stakes funct…☆783Updated 2 weeks ago
- Recipes for AI agents that use Asteroid to be safe and reliable. Want yours featured? Submit a PR!☆46Updated last month
- ☆19Updated this week
- The fastest way to build robust AI agents☆1,802Updated last week
- Taskiq plugin for postgres broker and results backend☆44Updated 3 weeks ago
- Reasoning Augmented Generation☆842Updated 3 months ago
- AG2 (formerly AutoGen): The Open-Source AgentOS. Join us at: https://discord.gg/pAbnFJrkgZ☆2,552Updated this week
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.☆6,813Updated last week
- Verifiers for LLM Reinforcement Learning☆953Updated this week