ai-in-pm / rStar-Math
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
☆38Updated 2 months ago
Alternatives and similar repositories for rStar-Math:
Users that are interested in rStar-Math are comparing it to the libraries listed below
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆66Updated 2 months ago
- ☆44Updated 3 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- ☆28Updated 3 months ago
- The official repository of the Omni-MATH benchmark.☆77Updated 3 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆148Updated last week
- FuseAI Project☆84Updated 2 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆158Updated this week
- ☆102Updated 3 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- [preprint] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆43Updated 2 months ago
- ☆60Updated 3 months ago
- Hammer: Robust Function-Calling for On-Device Language Models via Function Masking☆63Updated last month
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆62Updated last month
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆51Updated 3 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆74Updated last week
- ☆103Updated 2 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆62Updated 3 weeks ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆49Updated last week
- On Memorization of Large Language Models in Logical Reasoning☆56Updated 4 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- ☆100Updated 11 months ago
- ☆54Updated 5 months ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆165Updated 2 months ago
- Reformatted Alignment☆115Updated 6 months ago
- ☆166Updated last month
- ☆128Updated this week
- ☆51Updated 6 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆86Updated 5 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆101Updated this week