resistzzz / Co-rewardingLinks
[arXiv:2508.00410] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"
☆30Updated 3 months ago
Alternatives and similar repositories for Co-rewarding
Users that are interested in Co-rewarding are comparing it to the libraries listed below
Sorting:
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆36Updated 9 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆89Updated 11 months ago
- [arXiv:2508.00410] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"