The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
☆1,029May 9, 2026Updated this week
Alternatives and similar repositories for judgeval
Users that are interested in judgeval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PDBClean helps create a curated ensemble of molecular structures☆18Updated this week
- Workshop materials for AI Engineer World's Fair☆16Jun 3, 2025Updated 11 months ago
- Counterfactual Evaluation and Learning for Interactive Systems: Foundations, Implementations, and Recent Advances☆12Aug 14, 2022Updated 3 years ago
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".☆16May 3, 2022Updated 4 years ago
- AGiXT is a dynamic AI Automation Platform that seamlessly orchestrates instruction management and complex task execution across diverse A…☆24Jan 26, 2026Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Repo for Implementing Research Papers & Projects related to Machine Learning☆13Feb 9, 2025Updated last year
- anything you want can be built with morph cloud☆28Oct 14, 2025Updated 6 months ago
- (WSDM2022 Best Paper Award Runner-Up) "Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model"☆13Jul 16, 2023Updated 2 years ago
- A Pytorch implementation of "Deep Learning with Logged Bandit Feedback"☆10Aug 22, 2018Updated 7 years ago
- ☆239Jan 5, 2026Updated 4 months ago
- Openwater's Open-Source Neuromodulation Software☆26Jul 11, 2024Updated last year
- ☆12Jul 4, 2022Updated 3 years ago
- ☆24May 21, 2025Updated 11 months ago
- (ICTIR2020) "Unbiased Pairwise Learning from Biased Implicit Feedback"☆19Nov 21, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This project is to integrate Vercel AI-SDK-UI (frontend for chat interfaces) with Agno agentic python backend. This integration shows how…☆63Aug 27, 2025Updated 8 months ago
- ☆91Oct 2, 2023Updated 2 years ago
- Building self-refined guardrails via DSPy☆14Jul 2, 2024Updated last year
- ☆23Jan 3, 2025Updated last year
- Exploring advanced prompting tools to query SQL database with multiple tables in natural language using LLMs☆16Aug 23, 2024Updated last year
- Prompt-to-Leaderboard☆276May 9, 2025Updated last year
- Adversarial learning framework to enhance long-tail recommendation in Neural Collaborative Filtering☆21Nov 29, 2018Updated 7 years ago
- An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, …☆51Mar 25, 2026Updated last month
- A Python Natural Language Processing Toolkit for Electronic Health Record Texts☆13May 24, 2023Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Free question bank for quant interviews☆32Mar 24, 2025Updated last year
- AevaScenes Python SDK☆49Nov 6, 2025Updated 6 months ago
- 🐢 Open-Source Evaluation & Testing library for LLM Agents☆5,334Updated this week
- Emotional status bar for Claude Code — dual-channel emotional transparency with research-backed model☆53Apr 16, 2026Updated 3 weeks ago
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 6 months ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆17Feb 9, 2026Updated 3 months ago
- Pragmatic approach to parsing import profiles for CI's☆12Jul 1, 2024Updated last year
- Custom launcher for Claude Code, supporting dynamic prompts, layered configuration and easy custom hooks and MCPs.☆16May 1, 2026Updated last week
- Sequence Tagging for Biomedical Extractive Question Answering (Bioinformatics'2020)☆11Jul 3, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Evals for agents☆15Dec 4, 2024Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 7 months ago
- Posture correction using computer vision and Mediapipe library enables the detection and correction of poor posture in images and live vi…☆11Apr 9, 2025Updated last year
- Arena-Hard-Auto: An automatic LLM benchmark.☆1,018Jun 21, 2025Updated 10 months ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆1,806May 3, 2026Updated last week
- Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement…☆9,433Updated this week
- Code for the icml paper "zero inflated exponential family embedding"☆29Nov 2, 2017Updated 8 years ago