yuanzhoulvpi2017 / nano_rlLinks
在verl上做reward的定制开发
☆91Updated 2 months ago
Alternatives and similar repositories for nano_rl
Users that are interested in nano_rl are comparing it to the libraries listed below
Sorting:
- ☆47Updated 5 months ago
- llm & rl☆176Updated this week
- A live reading list for LLM-synthetic-data.☆343Updated this week
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆146Updated 7 months ago
- a-m-team's exploration in large language modeling☆178Updated 2 months ago
- ☆144Updated last year
- The related works and background techniques about Openai o1☆224Updated 6 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆132Updated 2 weeks ago
- ☆300Updated last month
- A curated list of awesome works in Routing LLMs paradigm (👉 Welcome to submit your contributions to this code repository)☆50Updated 3 weeks ago
- ☆84Updated last year
- Fantastic Data Engineering for Large Language Models☆89Updated 7 months ago
- 基于DPO算法微调语言大模型,简单好上手。☆40Updated last year
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆64Updated 5 months ago
- A comprehensive collection of process reward models.☆96Updated 2 weeks ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆241Updated last month
- ☆300Updated 2 months ago
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆710Updated 2 weeks ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆275Updated 3 weeks ago
- An Awesome List of Reinforcement Learning-based Large Language Agent Works. Collect directly from official code base.☆238Updated this week
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆381Updated last month
- ☆252Updated 3 weeks ago
- A series of technical report on Slow Thinking with LLM☆713Updated last month
- Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆218Updated last week
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆360Updated last year
- Awesome Agent Training☆204Updated last week
- ☆544Updated 7 months ago
- ☆149Updated 10 months ago
- ☆112Updated last year
- Reinforcement Learning in LLM and NLP.☆47Updated this week