yuanzhoulvpi2017 / nano_rlLinks
在verl上做reward的定制开发
☆117Updated 4 months ago
Alternatives and similar repositories for nano_rl
Users that are interested in nano_rl are comparing it to the libraries listed below
Sorting:
- ☆47Updated 7 months ago
- ☆145Updated last year
- A live reading list for LLM data synthesis (Updated to July, 2025).☆376Updated last month
- llm & rl☆219Updated 2 weeks ago
- a-m-team's exploration in large language modeling☆188Updated 4 months ago
- Reinforcement Learning in LLM and NLP.☆60Updated 3 weeks ago
- ☆83Updated last year
- ☆352Updated 3 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆149Updated 9 months ago
- ☆118Updated last year
- The related works and background techniques about Openai o1☆222Updated 8 months ago
- A comprehensive collection of process reward models.☆110Updated 2 months ago
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆262Updated 3 weeks ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆395Updated 3 months ago
- An Awesome List of Agentic Model trained with Reinforcement Learning☆483Updated 2 weeks ago
- Awesome Agent Training☆231Updated last month
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆136Updated 2 months ago
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆65Updated 7 months ago
- 本项目用于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。☆95Updated last year
- Fantastic Data Engineering for Large Language Models☆90Updated 9 months ago
- ☆163Updated last year
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆246Updated last month
- ☆269Updated 2 months ago
- ☆406Updated last month
- 基于DPO算法微调语言大模型,简单好上手。☆45Updated last year
- A curated list of awesome works in Routing LLMs paradigm (👉 Welcome to submit your contributions to this code repository)☆63Updated 2 months ago
- 对llama3进行全参微调、lora微调以及qlora微调。☆210Updated last year
- ☆549Updated 9 months ago
- 大模型进阶面经☆72Updated 4 months ago
- A series of technical report on Slow Thinking with LLM☆739Updated last month