CSHaitao / Awesome-LLMs-as-Judges
The official repo for paper, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods.
☆342Updated 4 months ago
Alternatives and similar repositories for Awesome-LLMs-as-Judges:
Users that are interested in Awesome-LLMs-as-Judges are comparing it to the libraries listed below
- Controllable Text Generation for Large Language Models: A Survey☆170Updated 7 months ago
- ☆313Updated 3 weeks ago
- A recipe for online RLHF and online iterative DPO.☆507Updated 3 months ago
- [ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.☆162Updated 5 months ago
- This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use".☆254Updated last week
- A Survey on Efficient Reasoning for LLMs☆332Updated this week
- Recipes to train the self-rewarding reasoning LLMs.☆213Updated last month
- Recipes to train reward model for RLHF.☆1,296Updated 2 months ago
- ☆153Updated 3 weeks ago
- ☆267Updated 8 months ago
- A series of technical report on Slow Thinking with LLM☆644Updated last week
- Codebase for Iterative DPO Using Rule-based Rewards☆240Updated last week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆477Updated this week
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆191Updated this week
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆363Updated 3 months ago
- Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…☆166Updated 4 months ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆190Updated last week
- ☆405Updated this week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆191Updated last month
- Building a comprehensive and handy list of papers for GUI agents☆302Updated last month
- The official repository of our survey paper: "Towards a Unified View of Preference Learning for Large Language Models: A Survey"☆172Updated 5 months ago
- This is the repository for the Tool Learning survey.☆359Updated last month
- Latest Advances on Long Chain-of-Thought Reasoning☆241Updated last week
- Survey of Small Language Models from Penn State, ...☆172Updated 3 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆244Updated last week
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆203Updated 11 months ago
- ☆283Updated last month
- adds Sequence Parallelism into LLaMA-Factory☆464Updated last week
- Generative AI Act II: Test Time Scaling Drives Cognition Engineering☆27Updated last week
- ☆122Updated this week