Gen-Verse / ReasonFluxLinks
ReasonFlux Series - A family of LLM post-training algorithms focusing on data selection, reinforcement learning, and inference scaling
☆485Updated last month
Alternatives and similar repositories for ReasonFlux
Users that are interested in ReasonFlux are comparing it to the libraries listed below
Sorting:
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆251Updated 3 months ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆271Updated 6 months ago
- A series of technical report on Slow Thinking with LLM☆727Updated last month
- ☆330Updated 3 months ago
- ☆315Updated 3 months ago
- ☆200Updated last month
- AN O1 REPLICATION FOR CODING☆334Updated 9 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆245Updated 4 months ago
- ☆439Updated last week
- A version of verl to support diverse tool use☆474Updated this week
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆295Updated this week
- ☆283Updated 3 months ago
- A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.☆649Updated last month
- Scaling RL on advanced reasoning models☆583Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆243Updated 4 months ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆355Updated last week
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆257Updated 4 months ago
- Code for the paper: "Learning to Reason without External Rewards"☆351Updated 2 months ago
- ☆209Updated 6 months ago
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆630Updated last month
- ✨ Agentic Reinforced Policy Optimization☆569Updated last week
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆149Updated 3 months ago
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆159Updated 3 months ago
- ☆210Updated 3 weeks ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆152Updated 2 months ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆450Updated 3 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆190Updated 5 months ago
- ☆330Updated last month
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆252Updated this week
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆203Updated 3 weeks ago