ai-in-pm / rStar-MathLinks
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
☆39Updated 6 months ago
Alternatives and similar repositories for rStar-Math
Users that are interested in rStar-Math are comparing it to the libraries listed below
Sorting:
- ☆102Updated 7 months ago
- Efficient Agent Training for Computer Use☆114Updated last month
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆145Updated 6 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆85Updated 3 months ago
- ☆47Updated last month
- RL Scaling and Test-Time Scaling (ICML'25)☆108Updated 5 months ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆121Updated 3 months ago
- ☆33Updated last month
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆195Updated last week
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆79Updated last month
- [ICML 2025] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search☆103Updated last month
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]☆163Updated last week
- ☆132Updated last month
- Repo for "Z1: Efficient Test-time Scaling with Code"☆63Updated 3 months ago
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆150Updated last month
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆106Updated 4 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆188Updated 3 months ago
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆152Updated last week
- ☆147Updated 5 months ago
- ☆59Updated last month
- ☆142Updated 2 months ago
- [COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.☆95Updated 3 months ago
- ☆90Updated 2 months ago
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆112Updated 3 months ago
- Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆197Updated 2 weeks ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆143Updated last month
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆146Updated 2 weeks ago
- ☆64Updated 7 months ago
- ☆77Updated 3 months ago
- AN O1 REPLICATION FOR CODING☆335Updated 7 months ago