zli12321 / free-form-grpoLinks
grpo to train long form QA and instructions with long-form reward model
☆16Updated 5 months ago
Alternatives and similar repositories for free-form-grpo
Users that are interested in free-form-grpo are comparing it to the libraries listed below
Sorting:
- ☆51Updated last year
- ☆11Updated last year
- ☆53Updated 3 months ago
- BeHonest: Benchmarking Honesty in Large Language Models☆34Updated last year
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆41Updated last year
- ☆51Updated 5 months ago
- ☆47Updated 9 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆76Updated 3 months ago
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆24Updated 3 months ago
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆73Updated 5 months ago
- ☆24Updated 9 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆72Updated 8 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆26Updated last year
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…☆67Updated last year
- ☆77Updated last year
- ☆70Updated last year
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆61Updated 3 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆70Updated 9 months ago
- ☆55Updated last year
- Reinforced Multi-LLM Agents training☆65Updated 7 months ago
- ☆22Updated last year
- 🔍 Awesome Agentic Search is a curated list of papers, tools, and resources on agentic search—where AI agents plan, search, and reason to…☆50Updated 4 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆68Updated last year
- [EMNLP-2025] R1-Zero on ANY TASK☆27Updated 2 months ago
- ☆17Updated 8 months ago
- ☆35Updated 2 weeks ago
- [ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization☆32Updated 11 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆63Updated last year
- Exploration of automated dataset selection approaches at large scales.☆53Updated 10 months ago
- ☆49Updated 9 months ago