haizelabs / verdictLinks

Inference-time scaling for LLMs-as-a-judge.

☆263

Alternatives and similar repositories for verdict

Users that are interested in verdict are comparing it to the libraries listed below

Sorting:

haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆108Updated 3 months ago
Ziems / arbor
A framework for optimizing DSPy programs with RL
☆94Updated this week
PrimeIntellect-ai / genesys
☆130Updated 4 months ago
NousResearch / Open-Reasoning-Tasks
A comprehensive repository of reasoning tasks for LLMs (and beyond)
☆448Updated 10 months ago
haizelabs / dspy-redteam
Red-Teaming Language Models with DSPy
☆203Updated 5 months ago
haizelabs / sphynx
Sphynx Hallucination Induction
☆53Updated 6 months ago
AnswerDotAI / fastdata
☆154Updated 8 months ago
ServiceNow / TapeAgents
TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle
☆288Updated this week
zbambergerNLP / strategic-debate-tot
A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments
☆87Updated 10 months ago
567-labs / kura
Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…
☆259Updated last month
MadryLab / context-cite
Attribute (or cite) statements generated by LLMs back to in-context information.
☆261Updated 9 months ago
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆100Updated this week
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆174Updated 4 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆157Updated 3 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆101Updated 4 months ago
brendanhogan / picoDeepResearch
☆64Updated 2 months ago
NousResearch / atropos
Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …
☆568Updated this week
haizelabs / j1-micro
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆94Updated 2 weeks ago
redotvideo / pluto
Synthetic Data for LLM Fine-Tuning
☆120Updated last year
Arize-ai / LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
☆102Updated last year
anthropic-experimental / agentic-misalignment
☆340Updated last month
normal-computing / extended-mind-transformers
☆123Updated 11 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆217Updated last week
pyember / ember
☆194Updated last month
SpellcraftAI / oaib
Use the OpenAI Batch tool to make async batch requests to the OpenAI API.
☆99Updated last year
UKGovernmentBEIS / inspect_evals
Collection of evals for Inspect AI
☆198Updated this week
quotient-ai / judges
A small library of LLM judges
☆241Updated last month
jerber / arc_agi
☆57Updated 3 weeks ago
princeton-pli / hal-harness
☆101Updated last week
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆45Updated 2 months ago