CSHaitao / Awesome-LLMs-as-JudgesLinks
The official repo for paper, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods.
☆374Updated 5 months ago
Alternatives and similar repositories for Awesome-LLMs-as-Judges
Users that are interested in Awesome-LLMs-as-Judges are comparing it to the libraries listed below
Sorting:
- ☆344Updated 2 weeks ago
- Controllable Text Generation for Large Language Models: A Survey☆175Updated 9 months ago
- A recipe for online RLHF and online iterative DPO.☆514Updated 5 months ago
- Recipes to train reward model for RLHF.☆1,356Updated last month
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models☆414Updated last week
- [ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.☆167Updated 6 months ago
- ☆283Updated 10 months ago
- verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…☆232Updated this week
- Train your Agent model via our easy and efficient framework☆776Updated this week
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆409Updated last month
- ☆557Updated this week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆213Updated 3 weeks ago
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆248Updated 3 weeks ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆368Updated 8 months ago
- A series of technical report on Slow Thinking with LLM☆679Updated last week
- Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…☆168Updated 5 months ago
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆705Updated 2 months ago
- Recipes to train the self-rewarding reasoning LLMs.☆219Updated 3 months ago
- Generative Judge for Evaluating Alignment☆238Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆554Updated 5 months ago
- Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`☆178Updated 6 months ago
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆228Updated this week
- ☆193Updated last week
- ☆109Updated 2 months ago
- ☆195Updated 3 weeks ago
- Benchmarking LLMs via Uncertainty Quantification☆230Updated last year
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆353Updated 8 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆220Updated last year
- ☆208Updated last week
- Code and Checkpoints for "Generate rather than Retrieve: Large Language Models are Strong Context Generators" in ICLR 2023.☆285Updated 2 years ago