JudgmentLabs / judgevalLinks
The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
☆1,010Updated this week
Alternatives and similar repositories for judgeval
Users that are interested in judgeval are comparing it to the libraries listed below
Sorting:
- OSS RL environment + evals toolkit☆189Updated this week
- Pixeltable — Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.☆987Updated this week
- 🚀 MassGen: An Open-source Multi-Agent Scaling System Inspired by Grok Heavy and Gemini Deep Think. Join the discord channel: https://dis…☆554Updated this week
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆236Updated last week
- On the Theoretical Limitations of Embedding-Based Retrieval☆582Updated last month
- ☆1,128Updated last week
- A tutorial on how to use Model Context Protocol by Anthropic and Agent2Agent Protocol by Google☆96Updated 5 months ago
- The AI Browser Automation Framework☆294Updated this week
- An encyclopedia of jailbreaking techniques to make AI models safer.☆519Updated 4 months ago
- Dynamiq is an orchestration framework for agentic AI and LLM applications☆931Updated last week
- An open-source tool for LLM prompt optimization.☆657Updated 2 weeks ago
- Tool for generating high quality Synthetic datasets☆1,282Updated 2 weeks ago
- Implement a reasoning LLM in PyTorch from scratch, step by step☆1,710Updated this week
- An MCP Multimodal AI Agent with eyes and ears!☆461Updated last month
- One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation t…☆344Updated last month
- Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol☆367Updated last month
- NdLinear by Ensemble is a drop-in PyTorch module that shrinks your models with no accuracy loss. It powers the Ensemble Platform—upload a…☆303Updated 4 months ago
- MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers☆350Updated last week
- A catalogue of existing Nanda servers☆187Updated 5 months ago
- The official Python library for Arklex framework☆685Updated last week
- Agent File (.af): An open file format for serializing stateful AI agents with persistent memory and behavior. Share, checkpoint, and vers…☆942Updated 4 months ago
- the os for claude code☆169Updated this week
- ☆16Updated 2 months ago
- A CLI for GPUs☆112Updated 2 weeks ago
- Cookbooks for AI Agents☆149Updated 5 months ago
- A category wise collection of 200+ LLM survey papers.☆179Updated 6 months ago
- Notion for AI Observability 📊☆309Updated this week
- ☆11Updated last month
- MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents☆450Updated last week
- ☆402Updated 3 weeks ago