The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
☆1,017Mar 2, 2026Updated this week
Alternatives and similar repositories for judgeval
Users that are interested in judgeval are comparing it to the libraries listed below
Sorting:
- (WSDM2022 Best Paper Award Runner-Up) "Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model"☆13Jul 16, 2023Updated 2 years ago
- Counterfactual Evaluation and Learning for Interactive Systems: Foundations, Implementations, and Recent Advances☆12Aug 14, 2022Updated 3 years ago
- A Pytorch implementation of "Deep Learning with Logged Bandit Feedback"☆10Aug 22, 2018Updated 7 years ago
- Supercharge your LeetCode practice - Add to Friends, Premium Features, per Contest Friends Rating, and more!☆16Jan 13, 2026Updated last month
- ☆12Jul 4, 2022Updated 3 years ago
- Code Repository for CrewAI Lightning Lessons Series☆36Nov 10, 2025Updated 3 months ago
- ☆18Apr 25, 2023Updated 2 years ago
- anything you want can be built with morph cloud☆27Oct 14, 2025Updated 4 months ago
- (ICTIR2020) "Unbiased Pairwise Learning from Biased Implicit Feedback"☆19Nov 21, 2022Updated 3 years ago
- Openwater's Open-Source Neuromodulation Software☆26Jul 11, 2024Updated last year
- Visual feedback from browser to AI. Click elements, add comments, fix code.☆60Feb 2, 2026Updated last month
- Code for the experiments of Matrix Factorization Bandit☆24Feb 4, 2019Updated 7 years ago
- Deep Reinforcement Learning by using Truly Proximal Policy Optimization in Tensorflow 2 and Pytorch☆22Nov 9, 2025Updated 3 months ago
- Evaluate LLM-synthesized @JuliaLang code.☆26Aug 17, 2024Updated last year
- (SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’☆21Nov 21, 2022Updated 3 years ago
- (RecSys2020) "Doubly Robust Estimator for Ranking Metrics with Post-Click Conversions"☆24Mar 25, 2023Updated 2 years ago
- (WSDM2020) "Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback"☆25Mar 24, 2023Updated 2 years ago
- ☆31Nov 14, 2024Updated last year
- A curated list of awesome open source libraries to deploy, monitor, version and scale your generative artificial intelligence application…☆66Updated this week
- ☆30Jun 22, 2020Updated 5 years ago
- A simple service for running hundreds of lighthouse tests in parallel via Google Cloud Tasks and Cloud Run. Includes options for blocking…☆30Apr 10, 2023Updated 2 years ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆74Aug 31, 2024Updated last year
- A red teaming agent☆18Oct 15, 2025Updated 4 months ago
- Kaggle: Quora Insincere Questions Classification - detect toxic content to improve online conversations☆36Dec 23, 2018Updated 7 years ago
- Deploy RAGs quickly, anywhere☆12Nov 20, 2025Updated 3 months ago
- B-Spline Density Estimation Library - nonparametric density estimation using B-Spline density estimator from univariate sample.☆16Aug 22, 2021Updated 4 years ago
- ☆12Updated this week
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 5 months ago
- (WSDM2020) "Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback"☆30Nov 21, 2022Updated 3 years ago
- Training setup for Langchain's Open Deep Research☆75Aug 28, 2025Updated 6 months ago
- Run SWE-bench evaluations remotely☆58Aug 14, 2025Updated 6 months ago
- Kinematic and dynamic models of continuum and articulated soft robots.☆15Nov 22, 2025Updated 3 months ago
- QRSS Plus: live QRSS grabbers from around the world☆10Feb 9, 2026Updated 3 weeks ago
- ☆10Jun 24, 2020Updated 5 years ago
- The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster …☆59Sep 11, 2025Updated 5 months ago
- CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics☆27Nov 1, 2025Updated 4 months ago
- Automatic point cloud processing tools for Matlab to characterize trees from terrestrial laser scanning point clouds☆10Oct 4, 2023Updated 2 years ago
- (RecSys 2020) Adaptively Distilled Exemplar Replay towards Continual Learning for Session-based Recommendation [Best Short Paper]☆33May 3, 2024Updated last year
- Code for the paper: Dense Reward for Free in Reinforcement Learning from Human Feedback (ICML 2024) by Alex J. Chan, Hao Sun, Samuel Holt…☆38Aug 11, 2024Updated last year