sail-sg / understand-r1-zero
Understanding R1-Zero-Like Training: A Critical Perspective
☆908Updated 3 weeks ago
Alternatives and similar repositories for understand-r1-zero:
Users that are interested in understand-r1-zero are comparing it to the libraries listed below
- Large Reasoning Models☆804Updated 5 months ago
- LIMO: Less is More for Reasoning☆927Updated last month
- Official Repo for Open-Reasoner-Zero☆1,904Updated last month
- ☆671Updated last week
- Recipes to scale inference-time compute of open models☆1,066Updated 2 months ago
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,198Updated 3 weeks ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆338Updated this week
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆619Updated 3 months ago
- ☆524Updated 3 weeks ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆1,698Updated this week
- A series of technical report on Slow Thinking with LLM☆659Updated 3 weeks ago
- ☆287Updated last month
- Muon is Scalable for LLM Training☆1,039Updated last month
- ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning☆808Updated last week
- Explore the Multimodal “Aha Moment” on 2B Model☆583Updated last month
- ☆739Updated 2 weeks ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆436Updated last month
- A bibliography and survey of the papers surrounding o1☆1,191Updated 5 months ago
- OLMoE: Open Mixture-of-Experts Language Models☆739Updated last month
- ☆924Updated 3 months ago
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆890Updated 2 months ago
- O1 Replication Journey☆1,986Updated 3 months ago
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models☆350Updated last week
- Training Large Language Model to Reason in a Continuous Latent Space☆1,094Updated 3 months ago
- Scalable RL solution for advanced reasoning of language models☆1,529Updated last month
- Awesome RL Reasoning Recipes ("Triple R")☆520Updated this week
- Unleashing the Power of Reinforcement Learning for Math and Code Reasoners☆540Updated 2 weeks ago
- TTRL: Test-Time Reinforcement Learning☆407Updated last week
- [ICML 2024] CLLMs: Consistency Large Language Models☆391Updated 5 months ago
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates☆382Updated last month