kaiwenzha / rl-tango
View external linksLinks

[NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning

☆50

Alternatives and similar repositories for rl-tango

Users that are interested in rl-tango are comparing it to the libraries listed below

Sorting:

facebookresearch / darling
View on GitHub
Official Implementation of the paper "Jointly Reinforcing Diversity and Quality in Language Model Generations"
☆55Dec 26, 2025Updated last month
Yixiao-Song / VeriScore
View on GitHub
☆33Dec 17, 2025Updated last month
UCSB-NLP-Chang / Prereq_tune
View on GitHub
Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"
☆11Jan 10, 2025Updated last year
WANGXinyiLinda / planning_tokens
View on GitHub
Official code for Guiding Language Model Math Reasoning with Planning Tokens
☆18Feb 29, 2024Updated last year
WujiangXu / EPO
View on GitHub
The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"
☆37Oct 1, 2025Updated 4 months ago
Jiahao004 / DeepTheorem
View on GitHub
☆25Jun 10, 2025Updated 8 months ago
rosieyzh / openrlhf-pretrain
View on GitHub
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
☆26Oct 14, 2025Updated 4 months ago
ritzz-ai / PACS
View on GitHub
☆31Sep 12, 2025Updated 5 months ago
hkust-nlp / Laser
View on GitHub
[ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆62May 22, 2025Updated 8 months ago
psunlpgroup / ReaLMistake
View on GitHub
This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".
☆30Aug 18, 2024Updated last year
TIGER-AI-Lab / General-Reasoner
View on GitHub
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆218Nov 27, 2025Updated 2 months ago
bruno686 / VisPlay
View on GitHub
VisPlay: Self-Evolving Vision-Language Models
☆44Updated this week
YujunZhou / EVOL-RL
View on GitHub
Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).
☆47Oct 16, 2025Updated 3 months ago
WisdomShell / RewardAnything
View on GitHub
RewardAnything: Generalizable Principle-Following Reward Models
☆45Jun 11, 2025Updated 8 months ago
shiwk24 / MathCanvas
View on GitHub
This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"
☆59Dec 29, 2025Updated last month
ryoungj / BoLT
View on GitHub
Code for "Reasoning to Learn from Latent Thoughts"
☆124Mar 28, 2025Updated 10 months ago
RM-R1-UIUC / RM-R1
View on GitHub
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆159Jun 26, 2025Updated 7 months ago
sotopia-lab / sotopia-rl
View on GitHub
Sotopia-RL: Reward Design for Social Intelligence
☆46Jan 29, 2026Updated 2 weeks ago
InternScience / Dolphin
View on GitHub
(ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback
☆38Jun 24, 2025Updated 7 months ago
bethgelab / sober-reasoning
View on GitHub
A Sober Look at Language Model Reasoning
☆92Nov 18, 2025Updated 2 months ago
uw-nsl / TinyV
View on GitHub
Your efficient and accurate answer verification system for RL training.
☆41Jun 23, 2025Updated 7 months ago
cmu-mind / RISE
View on GitHub
☆32Oct 31, 2024Updated last year
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning
☆62Oct 24, 2025Updated 3 months ago
Shuyib / tool_calling_api
View on GitHub
This project demonstrates function-calling with Python and Ollama, utilizing the Africa's Talking API to send airtime and messages to pho…
☆18Updated this week
princeton-pli / MeCo
View on GitHub
Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"
☆49Jun 30, 2025Updated 7 months ago
test-time-interaction / TTI
View on GitHub
☆72Jun 10, 2025Updated 8 months ago
27182812 / ChineseBERT_paddle
View on GitHub
用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information（ACL2021）
☆10Nov 15, 2021Updated 4 years ago
vint-1 / dreamsmooth
View on GitHub
DreamSmooth: Improving Model-Based RL with Reward Smoothing (ICLR 2024)
☆12May 6, 2024Updated last year
inboxedshoe / RP-DQN
View on GitHub
☆11Jan 11, 2022Updated 4 years ago
nirgreshler / bayesian-online-planning
View on GitHub
The code for the paper "A Bayesian Approach to Online Planning" published in ICML 2024.
☆13Jun 17, 2024Updated last year
suu990901 / KlearReasoner
View on GitHub
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆81Dec 25, 2025Updated last month
CLEANit / heatenginegym
View on GitHub
A collection of heat engines, based on the OpenAI Gym environment framework for use with reinforcement learning applications.
☆15Dec 20, 2021Updated 4 years ago
wassname / rl_2d_walker.js
View on GitHub
Teaching a humanoid to walk(ish), then displaying in your browser (using tensorflow.js and reinforcement learning)
☆10Sep 7, 2020Updated 5 years ago
Improbable-AI / orso
View on GitHub
☆16Feb 22, 2025Updated 11 months ago
holken / polite
View on GitHub
code for polite
☆11Feb 28, 2024Updated last year
LUMIA-Group / PonderingLM
View on GitHub
Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"
☆24Jul 21, 2025Updated 6 months ago
OuAzusaKou / imagination_mechanism
View on GitHub
About Code release for "Imagination Mechanism: Mesh Information Propagation for Enhancing Data Efficiency in Reinforcement Learning"
☆13Oct 7, 2023Updated 2 years ago
starnleymbote / Kikuyu_Kiswahili-translation
View on GitHub
This repository contains my models that has been trained to translate from kikuyu to kiswahili. It also contains the dataset used for the…
☆12Sep 10, 2018Updated 7 years ago
ReidarRiveland / Instruct-RNN
View on GitHub
☆14Mar 21, 2024Updated last year

kaiwenzha / rl-tangoView external linksLinks

Alternatives and similar repositories for rl-tango

kaiwenzha / rl-tango
View external linksLinks