haizelabs / verdict
Verdict is a library for scaling judge-time compute.
☆202Updated this week
Alternatives and similar repositories for verdict:
Users that are interested in verdict are comparing it to the libraries listed below
- ⚖️ Awesome LLM Judges ⚖️☆94Updated last week
- ☆123Updated last month
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆171Updated this week
- Train your own SOTA deductive reasoning model☆91Updated last month
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆150Updated this week
- Sphynx Hallucination Induction☆54Updated 3 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆437Updated 7 months ago
- ☆151Updated 5 months ago
- ☆97Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆85Updated 7 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 3 months ago
- Red-Teaming Language Models with DSPy☆188Updated 2 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆78Updated 7 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆172Updated last month
- Open source interpretability artefacts for R1.☆103Updated 2 weeks ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- ☆73Updated this week
- ☆73Updated last week
- Using various instructor clients evaluating the quality and capabilities of extractions and reasoning.☆51Updated 7 months ago
- Claude Deep Research config for Claude Code.☆170Updated last month
- ☆130Updated last month
- ☆22Updated 6 months ago
- smolLM with Entropix sampler on pytorch☆151Updated 6 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆90Updated this week
- Use the OpenAI Batch tool to make async batch requests to the OpenAI API.☆98Updated last year
- Tutorial for building LLM router☆198Updated 9 months ago
- AWM: Agent Workflow Memory☆268Updated 3 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆228Updated 6 months ago
- ☆109Updated 4 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 4 months ago