[NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
☆52Oct 23, 2025Updated 4 months ago
Alternatives and similar repositories for RL-Tango
Users that are interested in RL-Tango are comparing it to the libraries listed below
Sorting:
- Official Implementation of the paper "Jointly Reinforcing Diversity and Quality in Language Model Generations"☆57Dec 26, 2025Updated 2 months ago
- ☆33Dec 17, 2025Updated 2 months ago
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆11Jan 10, 2025Updated last year
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆18Feb 29, 2024Updated 2 years ago
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆37Oct 1, 2025Updated 5 months ago
- ☆25Jun 10, 2025Updated 8 months ago
- Training tiny models to prove hard theorems☆41Feb 15, 2026Updated 2 weeks ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆27Oct 14, 2025Updated 4 months ago
- [ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping☆63May 22, 2025Updated 9 months ago
- ☆46Jun 24, 2025Updated 8 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆31Aug 18, 2024Updated last year
- Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).☆48Oct 16, 2025Updated 4 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆223Nov 27, 2025Updated 3 months ago
- ☆27Oct 22, 2024Updated last year
- RewardAnything: Generalizable Principle-Following Reward Models☆45Jun 11, 2025Updated 8 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆124Mar 28, 2025Updated 11 months ago
- VisPlay: Self-Evolving Vision-Language Models☆47Feb 25, 2026Updated last week
- This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"☆63Dec 29, 2025Updated 2 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆159Jun 26, 2025Updated 8 months ago
- (ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback☆39Jun 24, 2025Updated 8 months ago
- A Sober Look at Language Model Reasoning☆93Nov 18, 2025Updated 3 months ago
- Your efficient and accurate answer verification system for RL training.☆41Jun 23, 2025Updated 8 months ago
- ☆33Oct 31, 2024Updated last year
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆39Sep 22, 2024Updated last year
- This project demonstrates function-calling with Python and Ollama, utilizing the Africa's Talking API to send airtime and messages to pho…☆18Feb 21, 2026Updated 2 weeks ago
- Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆50Jun 30, 2025Updated 8 months ago
- ☆72Jun 10, 2025Updated 8 months ago
- The code for the paper "A Bayesian Approach to Online Planning" published in ICML 2024.☆13Jun 17, 2024Updated last year
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Dec 25, 2025Updated 2 months ago
- A collection of heat engines, based on the OpenAI Gym environment framework for use with reinforcement learning applications.☆15Dec 20, 2021Updated 4 years ago
- Teaching a humanoid to walk(ish), then displaying in your browser (using tensorflow.js and reinforcement learning)☆10Sep 7, 2020Updated 5 years ago
- This repository contains my models that has been trained to translate from kikuyu to kiswahili. It also contains the dataset used for the…☆13Sep 10, 2018Updated 7 years ago
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆25Jul 21, 2025Updated 7 months ago
- DreamSmooth: Improving Model-Based RL with Reward Smoothing (ICLR 2024)☆12May 6, 2024Updated last year
- ☆14Mar 21, 2024Updated last year
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆18Feb 14, 2026Updated 2 weeks ago
- ☆16Feb 22, 2025Updated last year
- ☆46Mar 20, 2023Updated 2 years ago
- ☆11Jan 11, 2022Updated 4 years ago