kaiwenzha/RL-Tango

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kaiwenzha/RL-Tango)

kaiwenzha / RL-Tango

[NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning

☆57

Alternatives and similar repositories for RL-Tango

Users that are interested in RL-Tango are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HansiZeng / scaling-retriever
View on GitHub
[SIGIR 2025] The official repo for "Scaling Sparse and Dense Retrieval in Decoder-Only LLMs"
☆22Mar 31, 2025Updated last year
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
☆64Apr 11, 2026Updated 3 months ago
liushulinle / MarsRL
View on GitHub
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
☆18Nov 18, 2025Updated 8 months ago
ltzheng / SimpleTIR
View on GitHub
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆401Mar 30, 2026Updated 3 months ago
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
WujiangXu / EPO
View on GitHub
The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"
☆40Jul 13, 2026Updated 2 weeks ago
TIGER-AI-Lab / General-Reasoner
View on GitHub
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆229Nov 27, 2025Updated 8 months ago
suu990901 / KlearReasoner
View on GitHub
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆82Dec 25, 2025Updated 7 months ago
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
liushulinle / UloRL
View on GitHub
An Ultra-Long Output Reinforcement Learning Approach
☆23Jul 31, 2025Updated 11 months ago
zhangxy-2019 / critique-GRPO
View on GitHub
[ICML 2026 Spotlight] Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
☆70Jun 3, 2026Updated last month
zhyang2226 / AR-Lopti
View on GitHub
[ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
☆46May 20, 2025Updated last year
fengranMark / OpenDecoder
View on GitHub
A repository of OpenDecoder framework: Open Large Language Model Decoding to Incorporate Document Quality in RAG (WWW 2026)
☆26Jan 27, 2026Updated 6 months ago
cxcscmu / General-AgentBench
View on GitHub
Benchmark Test-Time Scaling of General LLM Agents
☆20Apr 14, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
YujunZhou / EVOL-RL
View on GitHub
Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).
☆51Mar 31, 2026Updated 3 months ago
DoYangTan / verl-rubric
View on GitHub
☆29Jan 31, 2026Updated 5 months ago
THUDM / TreeRL
View on GitHub
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
☆99Jun 16, 2025Updated last year
Parallel-Reasoning / APR
View on GitHub
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
☆145Dec 17, 2025Updated 7 months ago
Kwai-Klear / CE-GPPO
View on GitHub
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
☆16Jan 23, 2026Updated 6 months ago
SiliangZeng / Multi-Turn-RL-Agent
View on GitHub
☆139Jun 11, 2025Updated last year
sunblaze-ucb / Intuitor
View on GitHub
[ICLR 2026] Learning to Reason without External Rewards
☆420Jan 26, 2026Updated 6 months ago
JingyangYi / ShorterBetter
View on GitHub
☆18Jul 31, 2025Updated 11 months ago
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ZJU-REAL / HBPO
View on GitHub
☆34Aug 11, 2025Updated 11 months ago
royeisen / reasoning_loading_bar
View on GitHub
☆56Jul 7, 2025Updated last year
tianyi-lab / C3PO
View on GitHub
[COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"
☆21Apr 9, 2025Updated last year
sail-sg / feedback-conditional-policy
View on GitHub
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
☆65Jan 5, 2026Updated 6 months ago
Infini-AI-Lab / GRESO
View on GitHub
☆82Jun 8, 2026Updated last month
CSSLab / ThinkTwice
View on GitHub
Jointly Optimizing Large Language Models for Reasoning and Self-Refinement
☆15Apr 22, 2026Updated 3 months ago
BaohaoLiao / SAGE
View on GitHub
Self-Hinting Language Models Enhance Reinforcement Learning
☆27Mar 28, 2026Updated 4 months ago
RLsys-Foundation / APRIL
View on GitHub
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…
☆60Oct 11, 2025Updated 9 months ago
facebookresearch / darling
View on GitHub
Official Implementation of the paper "Jointly Reinforcing Diversity and Quality in Language Model Generations"
☆61May 8, 2026Updated 2 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Trae1ounG / Pretrain_Space_RLVR
View on GitHub
[arxiv: 2604.14142] From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
☆17Apr 16, 2026Updated 3 months ago
QwenLM / RationaleRM
View on GitHub
☆34Mar 18, 2026Updated 4 months ago
Frostlinx / Socratic-Zero
View on GitHub
Socratic-Zero is a fully autonomous framework that generates high-quality training data for mathematical reasoning
☆37Oct 26, 2025Updated 9 months ago
WeiXiongUST / Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning
View on GitHub
This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…
☆32Dec 5, 2024Updated last year
lichengliu03 / unary-feedback
View on GitHub
☆44Mar 31, 2026Updated 3 months ago
RLHFlow / Minimal-RL
View on GitHub
☆275May 14, 2025Updated last year
ibisbill / Transferability-of-LLM-Reasoning
View on GitHub
☆111Jul 6, 2026Updated 3 weeks ago