hscspring/rl-llm-nlp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hscspring/rl-llm-nlp)

hscspring / rl-llm-nlp

Curated, opinionated index of post-R1 LLM × Reinforcement Learning. Many deep-dive blog posts cross-linked to many papers — GRPO, DAPO, DPO, PPO, RLHF, GSPO, CISPO, VAPO, Reward Modeling, MoE RL stability, Verifier-Free RL, Training-Free RL, Agentic RL, DeepSeek-R1 reproduction.

☆71

Alternatives and similar repositories for rl-llm-nlp

Users that are interested in rl-llm-nlp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

quchangle1 / COLT
View on GitHub
The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.
☆26Nov 6, 2024Updated last year
Jikai0Wang / Speculative_CoT
View on GitHub
☆20May 14, 2025Updated last year
zwhong714 / PSFT
View on GitHub
[ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, co…
☆38Sep 9, 2025Updated 10 months ago
StarDewXXX / AdaR1
View on GitHub
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆24May 6, 2026Updated 2 months ago
eric-haibin-lin / verl-data
View on GitHub
☆14May 12, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
27182812 / ChineseBERT_paddle
View on GitHub
用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information（ACL2021）
☆10Nov 15, 2021Updated 4 years ago
hahahawu / Long-to-Short-via-Model-Merging
View on GitHub
Model merging is a highly efficient approach for long-to-short reasoning.
☆103Oct 15, 2025Updated 9 months ago
L1aoXingyu / llm-infer-bench
View on GitHub
☆12Sep 1, 2023Updated 2 years ago
Nathangitlab / Backdoor-Attacks-on-Crowd-Counting
View on GitHub
this is for the ACM MM paper---Backdoor Attack on Crowd Counting
☆17Jul 10, 2022Updated 4 years ago
yyi17 / DeepGlyco
View on GitHub
Prediction of glycopeptide fragment mass spectra by deep learning
☆12Feb 20, 2024Updated 2 years ago
uservan / speculative_thinking
View on GitHub
☆34Oct 13, 2025Updated 9 months ago
wjn1996 / KP-PLM
View on GitHub
（Accepted By EMNLP2022 main long）Knowledge Prompting in Pre-trained Language Model for Natural Language Understanding
☆15Oct 29, 2022Updated 3 years ago
ielab / llm-qlm
View on GitHub
Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
☆17Oct 26, 2023Updated 2 years ago
amazon-science / Self-Aligned-Reward-Towards_Effective_and_Efficient_Reasoners
View on GitHub
☆21Apr 21, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Chongjie-Si / Subspace-Tuning
View on GitHub
A generalized framework for subspace tuning methods in parameter efficient fine-tuning.
☆182Jan 29, 2026Updated 5 months ago
kevinng77 / blenderbot_paddle
View on GitHub
用Paddle复现Recipes for building an open-domain chatbot论文
☆11Nov 1, 2021Updated 4 years ago
efarrell1 / train_sparse_autoencoder
View on GitHub
Trains Sparse Autoencoders based on outputs from language models
☆11Oct 7, 2024Updated last year
KMnO4-zx / paper-workflow
View on GitHub
☆37Jan 20, 2026Updated 6 months ago
caiqizh / LUQ
View on GitHub
☆14Jan 14, 2026Updated 6 months ago
RUC-GSAI / Llama-3-SynE
View on GitHub
Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …
☆40May 31, 2025Updated last year
Mia-Cong / SWIFT
View on GitHub
Official implementation of "Can Test-Time Scaling Improve World Foundation Model?"
☆15Jul 12, 2025Updated last year
Ewenwan / pytorch-playground
View on GitHub
模型量化工程 Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception,…
☆12Aug 3, 2018Updated 7 years ago
IBM / auto-contrastive-generation
View on GitHub
Text generation using language models with multiple exit heads
☆16Sep 18, 2025Updated 10 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
vast-ai / vast-pyworker
View on GitHub
☆12May 20, 2025Updated last year
bradhilton / o1-chain-of-thought
View on GitHub
o1 Chain of Thought Examples
☆33Oct 4, 2024Updated last year
Fyuan0206 / Ancient_Books
View on GitHub
古籍解读大模型是基于InternLM2-7B的一款辅助学习工具，专为帮助用户理解和欣赏中国古代文学和文化而设计。它具备古诗赏析、文言文翻译、成语解释、《论语》注释以及《百家姓》解读等功能，使用户能够深入领会古代诗词、文献、成语典故和姓氏文化的精髓，是学术研究者、学生以及所…
☆15May 26, 2026Updated last month
MannLabs / DeepCollisionalCrossSection
View on GitHub
☆10Mar 2, 2021Updated 5 years ago
euiin / SMART
View on GitHub
SMART introduces a novel test-time framework where Small Language Models (SLMs) reason step-by-step, and Large Language Models (LLMs) pro…
☆12Jul 9, 2025Updated last year
wux-labs / OpenXLab-IntelligentSalesAssistant
View on GitHub
☆19Jun 21, 2024Updated 2 years ago
wang8740 / MAP
View on GitHub
Documentation at
☆14Mar 27, 2025Updated last year
wxr99 / HolisticPU
View on GitHub
Beyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends [NeurIPS 2023]
☆10Jan 28, 2024Updated 2 years ago
zhuhanqing / Lightening-Transformer-AE
View on GitHub
Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…
☆11Mar 3, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zhxieml / remiss-jailbreak
View on GitHub
☆33Jun 24, 2024Updated 2 years ago
fangjf1 / OpenSafeMLRM
View on GitHub
The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!
☆15Apr 8, 2025Updated last year
hughplay / memo
View on GitHub
📝 Anything for coding faster and more comfortable.
☆13Jan 21, 2026Updated 6 months ago
YunjiaXi / Awesome-Search-Agent-Papers
View on GitHub
☆171Jul 2, 2026Updated 2 weeks ago
KMnO4-zx / paper-insight
View on GitHub
Paper Insight - AI驱动的学术论文智能分析
☆126Updated this week
yyDing1 / ScaleQuest
View on GitHub
[ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…
☆69Oct 27, 2024Updated last year
deeplearning-wisc / sal
View on GitHub
source code for ICLR'24 paper "How does unlabeled data provably help OOD detection?"
☆13Feb 1, 2024Updated 2 years ago