haizelabs / verdictLinks
Inference-time scaling for LLMs-as-a-judge.
☆250Updated last week
Alternatives and similar repositories for verdict
Users that are interested in verdict are comparing it to the libraries listed below
Sorting:
- ⚖️ Awesome LLM Judges ⚖️☆107Updated 2 months ago
- Red-Teaming Language Models with DSPy☆202Updated 5 months ago
- A framework for optimizing DSPy programs with RL☆89Updated this week
- Train your own SOTA deductive reasoning model☆96Updated 4 months ago
- ☆128Updated 3 months ago
- Sphynx Hallucination Induction☆53Updated 5 months ago
- ☆154Updated 7 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆447Updated 9 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆285Updated this week
- A small library of LLM judges☆228Updated 2 weeks ago
- ☆122Updated 11 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆84Updated 9 months ago
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…☆245Updated last week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆99Updated 2 weeks ago
- Use the OpenAI Batch tool to make async batch requests to the OpenAI API.☆99Updated last year
- Collection of evals for Inspect AI☆173Updated this week
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆207Updated this week
- A strongly typed Python DSL for developing message passing multi agent systems☆53Updated last year
- Attribute (or cite) statements generated by LLMs back to in-context information.☆245Updated 9 months ago
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆535Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 4 months ago
- Open source interpretability artefacts for R1.☆154Updated 2 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- ☆96Updated 2 weeks ago
- ☆187Updated 2 weeks ago
- ☆23Updated 8 months ago
- ☆261Updated 3 weeks ago
- ☆55Updated this week
- Synthetic Data for LLM Fine-Tuning☆119Updated last year
- smolLM with Entropix sampler on pytorch☆150Updated 8 months ago