☆33Oct 31, 2024Updated last year
Alternatives and similar repositories for RISE
Users that are interested in RISE are comparing it to the libraries listed below
Sorting:
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆27May 26, 2025Updated 9 months ago
- Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)☆142Sep 21, 2024Updated last year
- ☆13Jul 14, 2024Updated last year
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆18Feb 29, 2024Updated 2 years ago
- The official source code for "Boosting LLM Agents with Recursive Contemplation for Effective Deception Handling" (ACL 2024, Findings)☆14Aug 12, 2024Updated last year
- ☆28Feb 13, 2026Updated 2 weeks ago
- ☆20Apr 16, 2025Updated 10 months ago
- Minimal Decision Transformer Implementation written in Jax (Flax).☆17Aug 8, 2022Updated 3 years ago
- ☆18Jul 10, 2022Updated 3 years ago
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆24Oct 8, 2024Updated last year
- ☆31Sep 12, 2025Updated 5 months ago
- Code for Abstract-to-Executable Trajectory Translation for One Shot Task Generalization (ICML 2023)☆23May 12, 2023Updated 2 years ago
- [NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts"☆41May 22, 2025Updated 9 months ago
- Official implementation of the NeurIPS 2024 paper CORY☆27Dec 22, 2025Updated 2 months ago
- ☆27Sep 22, 2025Updated 5 months ago
- Chain-of-Thought Predictive Control☆57May 1, 2023Updated 2 years ago
- Official code for "Pretraining Representations For Data-Efficient Reinforcement Learning" (NeurIPS 2021)☆55Jul 27, 2021Updated 4 years ago
- Sandbox environment for generalizable agent research☆27Aug 19, 2022Updated 3 years ago
- RL algorithm: Advantage induced policy alignment☆66Aug 11, 2023Updated 2 years ago
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆30Jun 27, 2024Updated last year
- ☆68Jun 25, 2024Updated last year
- [NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning☆52Oct 23, 2025Updated 4 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆46Jan 29, 2026Updated last month
- ☆33Jun 24, 2024Updated last year
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆158Oct 23, 2025Updated 4 months ago
- ☆35Jan 29, 2023Updated 3 years ago
- ☆35Dec 12, 2023Updated 2 years ago
- The official implementation of PFNs4BO: In-Context Learning for Bayesian Optimization☆40Sep 18, 2025Updated 5 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆41Jan 29, 2026Updated last month
- Dataset corresponding to the paper: "Form2Seq : A Framework for Higher-Order Form Structure Extraction"☆10Feb 17, 2021Updated 5 years ago
- rebuilds and completes models of protein complexes using AlphaFold2☆15Updated this week
- Official code for `Visual Attention Emerges from Recurrent Sparse Reconstruction' (ICML 2022)☆36Jul 5, 2022Updated 3 years ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆97Apr 9, 2025Updated 10 months ago
- ☆160Nov 23, 2024Updated last year
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆261May 5, 2025Updated 9 months ago
- ☆52Oct 23, 2023Updated 2 years ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆35Mar 19, 2024Updated last year