THUDM / TreeRLLinks
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
☆61Updated 3 months ago
Alternatives and similar repositories for TreeRL
Users that are interested in TreeRL are comparing it to the libraries listed below
Sorting:
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆76Updated last week
- ☆122Updated 6 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆81Updated 3 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆38Updated last week
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆46Updated 3 months ago
- ☆130Updated last week
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆69Updated 2 months ago
- ☆154Updated 3 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆134Updated 2 months ago
- ☆36Updated last month
- A Sober Look at Language Model Reasoning☆83Updated last week
- RL Scaling and Test-Time Scaling (ICML'25)☆111Updated 8 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆86Updated 7 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆80Updated 6 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆284Updated last week
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆69Updated 5 months ago
- ☆94Updated last month
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆80Updated 3 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆111Updated 4 months ago
- ☆60Updated 3 months ago
- Test-time preferenece optimization (ICML 2025).☆167Updated 4 months ago