mathllm / Step-Controlled_DPO
☆12Updated 2 months ago
Related projects: ⓘ
- Evaluating Mathematical Reasoning Beyond Accuracy☆32Updated 5 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆62Updated 3 months ago
- ☆46Updated 2 weeks ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆28Updated 8 months ago
- ☆23Updated 2 months ago
- Official implementation for the paper *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆57Updated 3 weeks ago
- Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆22Updated 5 months ago
- 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623