JudgmentLabs / judgevalLinks
The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
☆1,020Updated this week
Alternatives and similar repositories for judgeval
Users that are interested in judgeval are comparing it to the libraries listed below
Sorting:
- BharatMLStack is an open-source, end-to-end machine learning infrastructure stack built at Meesho to support real-time and batch ML workl…☆614Updated this week
- Find the Root Cause in Your Code's Trace☆389Updated this week
- A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch☆402Updated 2 months ago
- The official Python library for Arklex framework☆692Updated last week
- Repository of implementations of classic and sota rl algorithms from scratch in PyTorch☆221Updated 3 weeks ago
- ☆237Updated 3 weeks ago
- mahilo: Multi-Agent Human-in-the-Loop Framework is a flexible framework for creating multi-agent systems that can each interact with huma…☆463Updated 9 months ago
- vscode extension to convert computationally intensive pytorch kernels to triton☆21Updated last year
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆863Updated last week
- OSS RL environment + evals toolkit☆280Updated this week
- Orchestrate zero-shot computer vision models☆393Updated last year
- A multi-agent orchestration framework that works with any agent framework☆236Updated 7 months ago
- The best way to create, deploy, and share MCP Servers☆794Updated this week
- Together Open Deep Research☆356Updated 9 months ago
- Hitchcock a multi-agent movie maker, powered by mahilo☆69Updated 11 months ago
- A catalogue of existing Nanda servers☆190Updated 9 months ago
- curated collection of real world applications that use LLMs☆292Updated 9 months ago
- Curated collection of community environments☆208Updated this week
- Tzafon-WayPoint is a robust, scalable solution for managing large fleets of browser instances. WayPoint stands out with unmatched cold‑st…☆83Updated 9 months ago
- ☆86Updated 4 months ago
- The official Python SDK for Eval Protocol☆94Updated this week
- ☆10Updated last year
- This repo consists of prompting style of different widely used LLMs in the LLM space.☆36Updated last year
- "LLM from Zero to Hero: An End-to-End Large Language Model Journey from Data to Application!"☆141Updated last month
- ☆89Updated last week
- agent-from-scratch is a Python-based repository designed for developers and researchers interested in understanding the inner workings of…☆95Updated last year
- A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasks☆38Updated last year
- Lilac is an open-source tool that ensures your data scientists always have enough gpus for their work. We seamlessly connect compute from…☆117Updated 5 months ago
- An interface library for RL post training with environments.☆1,090Updated this week
- 🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch or based on seed data.☆654Updated last week