kaiwenzha / rl-tangoView external linksLinks
[NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
☆50Oct 23, 2025Updated 3 months ago
Alternatives and similar repositories for rl-tango
Users that are interested in rl-tango are comparing it to the libraries listed below
Sorting:
- Official Implementation of the paper "Jointly Reinforcing Diversity and Quality in Language Model Generations"☆55Dec 26, 2025Updated last month
- ☆33Dec 17, 2025Updated last month
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆11Jan 10, 2025Updated last year
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆18Feb 29, 2024Updated last year
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆37Oct 1, 2025Updated 4 months ago
- ☆25Jun 10, 2025Updated 8 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆26Oct 14, 2025Updated 4 months ago
- ☆31Sep 12, 2025Updated 5 months ago
- [ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping☆62May 22, 2025Updated 8 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆30Aug 18, 2024Updated last year
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆218Nov 27, 2025Updated 2 months ago
- VisPlay: Self-Evolving Vision-Language Models☆44Updated this week
- Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).☆47Oct 16, 2025Updated 3 months ago
- RewardAnything: Generalizable Principle-Following Reward Models☆45Jun 11, 2025Updated 8 months ago
- This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"☆59Dec 29, 2025Updated last month
- Code for "Reasoning to Learn from Latent Thoughts"☆124Mar 28, 2025Updated 10 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆159Jun 26, 2025Updated 7 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆46Jan 29, 2026Updated 2 weeks ago
- (ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback☆38Jun 24, 2025Updated 7 months ago
- A Sober Look at Language Model Reasoning☆92Nov 18, 2025Updated 2 months ago
- Your efficient and accurate answer verification system for RL training.☆41Jun 23, 2025Updated 7 months ago
- ☆32Oct 31, 2024Updated last year
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 3 months ago
- This project demonstrates function-calling with Python and Ollama, utilizing the Africa's Talking API to send airtime and messages to pho…☆18Updated this week
- Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆49Jun 30, 2025Updated 7 months ago
- ☆72Jun 10, 2025Updated 8 months ago
- 用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information(ACL2021)☆10Nov 15, 2021Updated 4 years ago
- DreamSmooth: Improving Model-Based RL with Reward Smoothing (ICLR 2024)☆12May 6, 2024Updated last year
- ☆11Jan 11, 2022Updated 4 years ago
- The code for the paper "A Bayesian Approach to Online Planning" published in ICML 2024.☆13Jun 17, 2024Updated last year
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Dec 25, 2025Updated last month
- A collection of heat engines, based on the OpenAI Gym environment framework for use with reinforcement learning applications.☆15Dec 20, 2021Updated 4 years ago
- Teaching a humanoid to walk(ish), then displaying in your browser (using tensorflow.js and reinforcement learning)☆10Sep 7, 2020Updated 5 years ago
- ☆16Feb 22, 2025Updated 11 months ago
- code for polite☆11Feb 28, 2024Updated last year
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆24Jul 21, 2025Updated 6 months ago
- About Code release for "Imagination Mechanism: Mesh Information Propagation for Enhancing Data Efficiency in Reinforcement Learning"☆13Oct 7, 2023Updated 2 years ago
- This repository contains my models that has been trained to translate from kikuyu to kiswahili. It also contains the dataset used for the…☆12Sep 10, 2018Updated 7 years ago
- ☆14Mar 21, 2024Updated last year