haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆90Updated last month
Alternatives and similar repositories for Awesome-LLM-Judges:
Users that are interested in Awesome-LLM-Judges are comparing it to the libraries listed below
- Verdict is a library for scaling judge-time compute.☆195Updated 3 weeks ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆170Updated 3 months ago
- Train your own SOTA deductive reasoning model☆83Updated last month
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…☆99Updated this week
- 🤗 Benchmark Large Language Models Reliably On Your Data☆233Updated this week
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆102Updated this week
- ☆120Updated 3 weeks ago
- OpenPipe ART (Agent Reinforcement Trainer): train LLM agents☆77Updated this week
- ☆145Updated last month
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆60Updated 3 weeks ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆429Updated 6 months ago
- Letting Claude Code develop his own MCP tools :)☆97Updated last month
- A user interface for DSPy☆142Updated 5 months ago
- Synthetic Data for LLM Fine-Tuning☆113Updated last year
- Red-Teaming Language Models with DSPy☆181Updated 2 months ago
- ☆112Updated 3 months ago
- ☆53Updated 2 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆139Updated last month
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆77Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆85Updated 6 months ago
- An automated tool for discovering insights from research papaer corpora☆137Updated 10 months ago
- AWM: Agent Workflow Memory☆257Updated 2 months ago
- ☆71Updated 2 months ago
- ☆37Updated 2 months ago
- Sphynx Hallucination Induction☆53Updated 2 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆175Updated this week
- Code for ScribeAgent paper☆55Updated last month
- ☆151Updated 4 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 8 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆89Updated this week