quotient-ai / judgesView external linksLinks
A small library of LLM judges
☆323Jul 31, 2025Updated 6 months ago
Alternatives and similar repositories for judges
Users that are interested in judges are comparing it to the libraries listed below
Sorting:
- splits videos into scenes with gpt-4o-mini and saves them separately☆12Dec 19, 2024Updated last year
- ☆12Apr 26, 2024Updated last year
- ☆23Jun 5, 2025Updated 8 months ago
- TaskWeaver Plugins☆12Jan 28, 2024Updated 2 years ago
- moodist☆24Jan 6, 2026Updated last month
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,593Dec 20, 2025Updated last month
- ☆19Mar 16, 2025Updated 11 months ago
- [ICML '24] R2E: Turn any GitHub Repository into a Programming Agent Environment☆140Apr 20, 2025Updated 9 months ago
- Async RL Training at Scale☆1,071Updated this week
- Inference-time scaling for LLMs-as-a-judge.☆329Nov 5, 2025Updated 3 months ago
- ☆21Jun 4, 2024Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Jun 3, 2024Updated last year
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆213Sep 18, 2025Updated 4 months ago
- ☆20Jan 7, 2024Updated 2 years ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆222Apr 29, 2024Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,852May 17, 2025Updated 8 months ago
- ☆40Jul 26, 2024Updated last year
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆855Updated this week
- Foyle is a copilot to help developers deploy and operate their applications.☆133Mar 17, 2025Updated 11 months ago
- ☆11Aug 25, 2021Updated 4 years ago
- ☆38Apr 17, 2024Updated last year
- Leverage your LangChain trace data for fine tuning☆46Aug 2, 2024Updated last year
- Analyzing the most strategic words to guess on Wordle, based on letter frequency distributions☆11Feb 20, 2022Updated 3 years ago
- ☆13Nov 5, 2024Updated last year
- ☆10Dec 3, 2020Updated 5 years ago
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,234May 8, 2024Updated last year
- Agent Engineering course files☆71Jul 12, 2025Updated 7 months ago
- Fast Multimodal Semantic Deduplication & Filtering☆886Jan 20, 2026Updated 3 weeks ago
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆28Nov 25, 2024Updated last year
- Tooling for exact and MinHash deduplication of large-scale text datasets☆68Feb 4, 2026Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,885Updated this week
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,703Jan 9, 2026Updated last month
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆2,054Dec 3, 2025Updated 2 months ago
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆4,852Feb 9, 2026Updated last week
- ☆10Nov 23, 2020Updated 5 years ago
- A duckdb extension that executes js (provided by you or generated via OpenAI) in an embedded v8 interpreter and returns a table☆19Jun 9, 2025Updated 8 months ago
- Semantic Ranking Solution for Azure Database for PostgreSQL☆14Apr 29, 2025Updated 9 months ago
- decontamination☆24Dec 3, 2025Updated 2 months ago
- Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining☆13Oct 22, 2021Updated 4 years ago