SIMONLQY / RethinkMCTSLinks
☆31Updated last year
Alternatives and similar repositories for RethinkMCTS
Users that are interested in RethinkMCTS are comparing it to the libraries listed below
Sorting:
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆39Updated 2 years ago
- e☆43Updated 9 months ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Updated last year
- ☆34Updated last year
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆50Updated last year
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆65Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆120Updated 9 months ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆74Updated last year
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆149Updated 4 months ago
- GenRM-CoT: Data release for verification rationales☆68Updated last year
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆96Updated 10 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆120Updated last week
- ☆53Updated last year
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆50Updated 2 years ago
- ☆90Updated 3 months ago
- ☆103Updated 2 years ago
- Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'☆27Updated 8 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆115Updated 2 years ago
- ☆56Updated last year
- Directional Preference Alignment☆58Updated last year
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆64Updated last year
- Evaluate the Quality of Critique☆36Updated last year
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆159Updated last year
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆53Updated last year
- RL Scaling and Test-Time Scaling (ICML'25)☆113Updated last year
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆28Updated 2 years ago
- ☆41Updated last year
- [EMNLP 2025] Verification Engineering for RL in Instruction Following☆50Updated last month
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆124Updated last year
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆51Updated last year