haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆94Updated last week
Alternatives and similar repositories for Awesome-LLM-Judges:
Users that are interested in Awesome-LLM-Judges are comparing it to the libraries listed below
- Verdict is a library for scaling judge-time compute.☆202Updated last week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 3 months ago
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…☆104Updated 3 weeks ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆65Updated last month
- ☆123Updated last month
- Train your own SOTA deductive reasoning model☆91Updated 2 months ago
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆257Updated this week
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆150Updated last week
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆17Updated last week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆172Updated 2 months ago
- Functional Benchmarks and the Reasoning Gap☆85Updated 7 months ago
- ☆147Updated 2 months ago
- Open source interpretability artefacts for R1.☆105Updated 2 weeks ago
- ☆73Updated this week
- Letting Claude Code develop his own MCP tools :)☆98Updated last month
- Red-Teaming Language Models with DSPy☆188Updated 2 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆146Updated 3 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 4 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- Sphynx Hallucination Induction☆54Updated 3 months ago
- AWM: Agent Workflow Memory☆269Updated 3 months ago
- ☆151Updated 5 months ago
- accompanying material for sleep-time compute paper☆77Updated last week
- ☆54Updated 3 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆139Updated 2 months ago
- ☆130Updated last month
- Prompt design in Python☆57Updated 5 months ago
- ☆55Updated this week
- ☆22Updated 6 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆264Updated this week