haizelabs / Awesome-LLM-JudgesLinks
⚖️ Awesome LLM Judges ⚖️
☆103Updated last month
Alternatives and similar repositories for Awesome-LLM-Judges
Users that are interested in Awesome-LLM-Judges are comparing it to the libraries listed below
Sorting:
- Scale your LLM-as-a-judge.☆232Updated last week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆171Updated 4 months ago
- ☆126Updated 2 months ago
- Open source interpretability artefacts for R1.☆138Updated last month
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆38Updated 3 weeks ago
- Official repo for Learning to Reason for Long-Form Story Generation☆58Updated last month
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆169Updated this week
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆67Updated 2 months ago
- ☆57Updated last week
- Exploring Applications of GRPO☆229Updated 2 weeks ago
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…☆136Updated this week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆139Updated 3 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago
- ☆111Updated 5 months ago
- ☆152Updated 6 months ago
- ☆76Updated last month
- ☆151Updated 2 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 7 months ago
- ☆131Updated 2 months ago
- look how they massacred my boy☆63Updated 7 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆443Updated 8 months ago
- AWM: Agent Workflow Memory☆271Updated 4 months ago
- AGI SDK☆53Updated this week
- Just a bunch of benchmark logs for different LLMs☆118Updated 10 months ago
- Functional Benchmarks and the Reasoning Gap☆86Updated 8 months ago
- Synthetic Data for LLM Fine-Tuning☆116Updated last year
- explore token trajectory trees on instruct and base models☆122Updated this week
- Plotting (entropy, varentropy) for small LMs☆96Updated last week
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆270Updated this week