Curated, opinionated index of post-R1 LLM × Reinforcement Learning. Many deep-dive blog posts cross-linked to many papers — GRPO, DAPO, DPO, PPO, RLHF, GSPO, CISPO, VAPO, Reward Modeling, MoE RL stability, Verifier-Free RL, Training-Free RL, Agentic RL, DeepSeek-R1 reproduction.
☆69Jun 22, 2026Updated last week
Alternatives and similar repositories for rl-llm-nlp
Users that are interested in rl-llm-nlp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14May 12, 2025Updated last year
- Official Implementation of Avoiding spurious correlations via logit correction☆17May 6, 2023Updated 3 years ago
- (ACL 2025 Main) Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillat…☆35Aug 23, 2025Updated 10 months ago
- Documentation at☆14Mar 27, 2025Updated last year
- Classify image and text with ResNet and BERT models using Pytorch☆13Jul 7, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆17Jun 10, 2025Updated last year
- Trains Sparse Autoencoders based on outputs from language models☆11Oct 7, 2024Updated last year
- ☆16Jul 12, 2024Updated last year
- 小红书 / 抖音 / 快手 / 视频号 / B 站 自媒体账号体检工具 — 扫同赛道找对标、拆爆款为什么爆、诊断为什么没人看,顺手给可粘贴的仿写初稿 。Claude Code skill。☆127Jun 21, 2026Updated last week
- [ACL 2026 Main Conference] Paper list for the survey "A Survey of Deep Learning for Geometry Problem Solving"☆36Sep 14, 2025Updated 9 months ago
- 2024广西数字开放创新应用大赛,多模态新闻谣言分类☆19Jan 18, 2025Updated last year
- FashionAI challenge based on OpenPose☆17Apr 1, 2019Updated 7 years ago
- ☆19Jun 21, 2024Updated 2 years ago
- [AAAI 2025] Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks☆12Jun 19, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [Paper][EMNLP 2025] Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching☆35Feb 8, 2026Updated 4 months ago
- Source codes for the paper "Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning" (PDMER) which p…☆14Mar 24, 2025Updated last year
- Code and Data for ACL 2025 Paper "Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework".☆25Oct 3, 2025Updated 8 months ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆26Nov 29, 2024Updated last year
- ☆18Nov 22, 2025Updated 7 months ago
- GPTCloneBench is a clone detection benchmark based on SemanticCloneBench and GPT.☆15Feb 5, 2025Updated last year
- [EMNLP 2025] HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning☆56Nov 12, 2025Updated 7 months ago
- A lightweight, continuously-updated catalog of research papers on AI agents.☆29Oct 13, 2025Updated 8 months ago
- ☆17Nov 3, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A curated collection of research and techniques for protecting intellectual property of large language models, including watermarking, fi…☆51Jun 10, 2026Updated 3 weeks ago
- (ACL 2025 Main) Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification - Offici…☆21Dec 26, 2025Updated 6 months ago
- Targeted Data Generation with Large Language Models☆19Jun 25, 2024Updated 2 years ago
- Official implementation of the paper "Sparse Feature Factorization for Recommender Systems with Knowledge Graphs"☆22Oct 13, 2022Updated 3 years ago
- A Survey of Direct Preference Optimization (DPO)☆96Jul 4, 2025Updated 11 months ago
- Code for Multi-Aspect Cross-modal Quantization for Generative Recommendation. (AAAI 2026 Oral)☆43Dec 9, 2025Updated 6 months ago
- ☆20May 14, 2025Updated last year
- Formal representation and solving for Euclidean plane geometry problems.☆42May 22, 2026Updated last month
- Official repository of the ACL 2024 paper "Rethinking Task-Oriented Dialogue Systems: From Complex Modularity to Zero-Shot Autonomous Age…☆20May 28, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Website for release of TellMeWhy dataset for why question answering☆14Nov 11, 2022Updated 3 years ago
- ☆22Aug 18, 2024Updated last year
- An AI-powered content conversion tool that transforms text, web content, or HTML code into beautifully designed card images.一款基于AI的内容转换工…☆33Jul 29, 2025Updated 11 months ago
- Data and baseline code of EMNLP 2021 paper "MLEC-QA: A Chinese Multi-Choice Biomedical Question Answering Dataset".☆32Nov 5, 2021Updated 4 years ago
- JLU drcom client written in golang.☆12Sep 4, 2019Updated 6 years ago
- ☆10Oct 20, 2020Updated 5 years ago
- ☆11Apr 4, 2018Updated 8 years ago