JudgmentLabs / judgevalLinks
The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
☆1,020Updated this week
Alternatives and similar repositories for judgeval
Users that are interested in judgeval are comparing it to the libraries listed below
Sorting:
- "LLM from Zero to Hero: An End-to-End Large Language Model Journey from Data to Application!"☆141Updated last month
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆874Updated last week
- BharatMLStack is an open-source, end-to-end machine learning infrastructure stack built at Meesho to support real-time and batch ML workl…☆616Updated this week
- OSS RL environment + evals toolkit☆290Updated last week
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆252Updated last month
- An interface library for RL post training with environments.☆1,132Updated this week
- AgentTrace is a lightweight observability library to trace and evaluate agentic systems.☆41Updated 10 months ago
- The official Python SDK for Eval Protocol☆94Updated last week
- ☆237Updated last month
- A catalogue of existing Nanda servers☆190Updated 9 months ago
- Find the Root Cause in Your Code's Trace☆392Updated last week
- A month-long, open-source AI Agent Hackathon — open to all builders and dreamers working on agents, RAG, tool use, and multi-agent system…☆242Updated 7 months ago
- Lightly-reviewed collection of community environments☆210Updated 2 weeks ago
- the os for claude code☆164Updated 3 months ago
- An open-source tool for LLM prompt optimization.☆765Updated 2 weeks ago
- 🚀 MassGen is an open-source multi-agent scaling system that runs in your terminal, autonomously orchestrating frontier models and agents…☆726Updated this week
- ☆101Updated 8 months ago
- On the Theoretical Limitations of Embedding-Based Retrieval☆622Updated 4 months ago
- The CLI for GPUs☆146Updated 2 months ago
- 🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch or based on seed data.☆692Updated this week
- 100+ LLM interview questions with answers.☆721Updated 2 weeks ago
- From idea to production in just few lines: Graph-Based Programmable Neuro-Symbolic LM Framework - a production-first LM framework built w…☆411Updated last week
- Ranking LLMs on agentic tasks☆211Updated 2 months ago
- The AI Browser Automation Framework☆413Updated this week
- NdLinear by Ensemble is a drop-in PyTorch module that shrinks your models with no accuracy loss. It powers the Ensemble Platform—upload a…☆299Updated 8 months ago
- ⚖️ Awesome LLM Judges ⚖️☆161Updated 9 months ago
- Provider-agnostic, open-source evaluation infrastructure for language models☆719Updated last month
- A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch☆402Updated 2 months ago
- A multi-agent orchestration framework that works with any agent framework☆237Updated 8 months ago
- Context Engineering Course with DSPy☆214Updated 6 months ago