yuanzhoulvpi2017/nano_rl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuanzhoulvpi2017/nano_rl)

yuanzhoulvpi2017 / nano_rl

在verl上做reward的定制开发

☆182

Alternatives and similar repositories for nano_rl

Users that are interested in nano_rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yuanzhoulvpi2017 / zero_agent
View on GitHub
☆38Apr 19, 2026Updated 3 months ago
chunhuizhang / llm_rl
View on GitHub
llm & rl
☆291Oct 24, 2025Updated 9 months ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,667Updated this week
wyf3 / llm_related
View on GitHub
复现大模型相关算法及一些学习记录
☆3,465Jul 2, 2026Updated 3 weeks ago
alibaba / ROLL
View on GitHub
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
☆3,327Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MraDonkey / rethinking_prompting
View on GitHub
[ACL 2025 Main] (🏆 Outstanding Paper Award) Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Proba…
☆18Aug 15, 2025Updated 11 months ago
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,853Jul 14, 2026Updated last week
sylvain-wei / 24-Game-Reasoning
View on GitHub
超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…
☆35Apr 5, 2025Updated last year
zhaoyingjun / Tiny-R2
View on GitHub
Tiny-R2: A hybrid architecture integrating SWA, CSA, HCA, mHC, and DSMoE under the DeepSeek V4 design paradigm, enabling single-GPU OPD p…
☆46May 30, 2026Updated last month
Jianglin954 / awesome-on-policy-distillation
View on GitHub
A curated list of resources on on-policy distillation
☆25Apr 13, 2026Updated 3 months ago
Xinyi-0724 / Search-R1-Qwen3
View on GitHub
Enhanced Search-R1 Implementation: Improved Compatibility and Modern Framework Integration
☆30Dec 8, 2025Updated 7 months ago
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,082Updated this week
PeterGriffinJin / Search-R1
View on GitHub
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
☆5,156Nov 13, 2025Updated 8 months ago
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,154Jun 9, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
Simple-Efficient / RL-Factory
View on GitHub
Train your Agent model via our easy and efficient framework
☆1,773Dec 5, 2025Updated 7 months ago
owenliang / owenliang
View on GitHub
☆33Apr 6, 2026Updated 3 months ago
wyt2000 / InverseCoder
View on GitHub
[AAAI 2025] The official code of the paper "InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct"(http…
☆16Jul 10, 2024Updated 2 years ago
ReTool-RL / ReTool
View on GitHub
☆386Aug 12, 2025Updated 11 months ago
NLPJCL / SearchAgent-Zero
View on GitHub
SearchAgent-Zero: A Scalable Multi-Turn Search Agent RL Framework
☆143Updated this week
smiles724 / Awesome-LLM-RLVR
View on GitHub
Collection of latest papers and materials in the area of RLVR!
☆136Updated this week
yann168 / boshi-sample-solution
View on GitHub
☆15Nov 22, 2023Updated 2 years ago
Justherozen / TRAILER
View on GitHub
[CVPR 2024] Targeted Representation Alignment for Open-World Semi-Supervised Learning
☆14Sep 23, 2024Updated last year
Solunny / EIAD
View on GitHub
☆16May 16, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AgentR1 / Agent-R1
View on GitHub
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
☆1,571Updated this week
Mryangkaitong / deepseek-r1-gsm8k
View on GitHub
☆49Feb 10, 2025Updated last year
CSfufu / Revisual-R1
View on GitHub
[ICLR 2026]🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, mul…
☆212Dec 10, 2025Updated 7 months ago
Elvin-Yiming-Du / Memory-T1
View on GitHub
This respository is used for time reasoning task for mult-session dialogue system.
☆17Feb 7, 2026Updated 5 months ago
verl-project / verl-recipe
View on GitHub
A set of examples based on verl for end-to-end RL training recipes.
☆312Updated this week
yale-nlp / Bright-Pro
View on GitHub
Data and code for ACL 2026 Paper "Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems…
☆19Apr 30, 2026Updated 2 months ago
zhrli324 / RLEdit
View on GitHub
[ICML2025] Official code for "Reinforced Lifelong Editing for Language Models"
☆23Feb 23, 2025Updated last year
AkaliKong / RecZero
View on GitHub
☆16Dec 21, 2025Updated 7 months ago
chunhuizhang / personal_chatgpt
View on GitHub
personal chatgpt
☆416Jan 11, 2026Updated 6 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
zz-haooo / LLMs-Preference-Optimization
View on GitHub
☆18May 31, 2024Updated 2 years ago
MraDonkey / DMAD
View on GitHub
[ICLR 2025] Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate
☆25Apr 22, 2025Updated last year
chenllliang / G1
View on GitHub
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆103May 20, 2025Updated last year
ASTRAL-Group / LoRe
View on GitHub
When Reasoning Meets Its Laws
☆38Jan 2, 2026Updated 6 months ago
siyan-zhao / OPSD
View on GitHub
☆506May 10, 2026Updated 2 months ago
wzhwzhwzh0921 / Awesome_LRM_with_Entropy
View on GitHub
Introduction about AWESOME_ENTROPY+LRM_PAPERS
☆32Dec 16, 2025Updated 7 months ago
CJReinforce / PURE
View on GitHub
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆172Oct 23, 2025Updated 9 months ago