microsoft / rStar
☆518Updated this week
Alternatives and similar repositories for rStar:
Users that are interested in rStar are comparing it to the libraries listed below
- ☆630Updated 2 weeks ago
- A series of technical report on Slow Thinking with LLM☆630Updated last week
- Large Reasoning Models☆802Updated 4 months ago
- ☆920Updated 2 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆863Updated this week
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆503Updated last month
- LIMO: Less is More for Reasoning☆913Updated 2 weeks ago
- ☆282Updated last month
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning☆676Updated this week
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates☆373Updated 2 weeks ago
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,141Updated last week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆455Updated this week
- Automatic evals for LLMs☆370Updated this week
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆613Updated 3 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆320Updated this week
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆248Updated 2 months ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆1,382Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆314Updated 4 months ago
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆676Updated last month
- Recipes to scale inference-time compute of open models☆1,055Updated last month
- RewardBench: the first evaluation tool for reward models.☆553Updated last month
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆199Updated 3 weeks ago
- Verifiers for LLM Reinforcement Learning☆813Updated 2 weeks ago
- ☆1,014Updated 4 months ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆361Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆190Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆177Updated last week
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆430Updated 2 weeks ago
- Training Large Language Model to Reason in a Continuous Latent Space☆1,062Updated 2 months ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆480Updated last week