haizelabs / verdictLinks
Scale your LLM-as-a-judge.
☆232Updated last week
Alternatives and similar repositories for verdict
Users that are interested in verdict are comparing it to the libraries listed below
Sorting:
- ⚖️ Awesome LLM Judges ⚖️☆103Updated last month
- ☆126Updated 2 months ago
- A framework for optimizing DSPy programs with RL☆58Updated this week
- Collection of evals for Inspect AI☆139Updated this week
- ☆152Updated 6 months ago
- Open source interpretability artefacts for R1.☆138Updated last month
- Red-Teaming Language Models with DSPy☆193Updated 3 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago
- ☆76Updated last month
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆94Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆171Updated 4 months ago
- Sphynx Hallucination Induction☆54Updated 4 months ago
- ☆86Updated 3 weeks ago
- ☆178Updated last month
- ☆131Updated 2 months ago
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆270Updated this week
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆169Updated this week
- Attribute (or cite) statements generated by LLMs back to in-context information.☆235Updated 7 months ago
- Functional Benchmarks and the Reasoning Gap☆86Updated 8 months ago
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆447Updated this week
- Official repo for Learning to Reason for Long-Form Story Generation☆58Updated last month
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆80Updated 8 months ago
- Extract full next-token probabilities via language model APIs☆248Updated last year
- A strongly typed Python DSL for developing message passing multi agent systems☆53Updated last year
- ☆41Updated 4 months ago
- METR Task Standard☆147Updated 3 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆238Updated 3 months ago
- Prompt engineering, automated.☆321Updated last month
- ☆119Updated 9 months ago