qiufengqijun / open-r1-reprodLinks
这是一个open-r1的复现项目,对0.5B、1.5B、3B、7B的qwen模型进行GRPO训练,观察到一些有趣的现象。
☆30Updated last month
Alternatives and similar repositories for open-r1-reprod
Users that are interested in open-r1-reprod are comparing it to the libraries listed below
Sorting:
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆531Updated last week
- ☆216Updated 2 weeks ago
- Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥☆259Updated 4 months ago
- Awesome RL Reasoning Recipes ("Triple R")☆605Updated this week
- ☆210Updated last week
- Collect every awesome work about r1!☆376Updated last month
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆420Updated last month
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆164Updated last year
- ☆540Updated 5 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆203Updated 3 months ago
- minimal-cost for training 0.5B R1-Zero☆734Updated 3 weeks ago
- 本项目用于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。☆87Updated 8 months ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆198Updated last year
- ☆151Updated last month
- Awesome Agent Training☆141Updated this week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆548Updated 2 weeks ago
- Awesome RL-based LLM Reasoning☆511Updated last month
- llm & rl☆139Updated this week
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆332Updated last month
- A series of technical report on Slow Thinking with LLM☆685Updated this week
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆212Updated this week
- ☆210Updated 2 weeks ago
- ☆269Updated last week
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆141Updated 5 months ago
- The related works and background techniques about Openai o1☆221Updated 5 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆123Updated this week
- a-m-team's exploration in large language modeling☆130Updated last week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆129Updated last month
- ☆144Updated 4 months ago
- 这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。☆415Updated 3 months ago