InternLM / OREALLinks
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆192Updated 10 months ago
Alternatives and similar repositories for OREAL
Users that are interested in OREAL are comparing it to the libraries listed below
Sorting:
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆147Updated 9 months ago
- ☆328Updated 8 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆260Updated 8 months ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆164Updated 4 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations