microsoft / rStar
☆524Updated 3 weeks ago
Alternatives and similar repositories for rStar:
Users that are interested in rStar are comparing it to the libraries listed below
- ☆683Updated last week
- ☆928Updated 3 months ago
- A series of technical report on Slow Thinking with LLM☆659Updated 3 weeks ago
- LIMO: Less is More for Reasoning☆933Updated last month
- ☆287Updated last month
- Large Reasoning Models☆804Updated 5 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆915Updated 3 weeks ago
- ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning☆835Updated last week
- Recipes to scale inference-time compute of open models☆1,068Updated this week
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆518Updated last month
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆619Updated 3 months ago
- Training Large Language Model to Reason in a Continuous Latent Space☆1,104Updated 3 months ago
- [ICML 2025 Spotlight] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction☆520Updated this week
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆222Updated last month
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,219Updated last month
- TTRL: Test-Time Reinforcement Learning☆452Updated last week
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆353Updated 8 months ago
- ☆328Updated 3 months ago
- ☆1,019Updated 4 months ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆254Updated 2 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆198Updated this week
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models☆372Updated 2 weeks ago
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates☆382Updated this week
- An Open Large Reasoning Model for Real-World Solutions☆1,488Updated 2 months ago
- AN O1 REPLICATION FOR CODING☆333Updated 5 months ago
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆697Updated last month
- Search-o1: Agentic Search-Enhanced Large Reasoning Models☆851Updated this week
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆458Updated this week
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆373Updated last week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆235Updated 3 weeks ago