WooooDyy/LLM-Reverse-Curriculum-RL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/WooooDyy/LLM-Reverse-Curriculum-RL)

WooooDyy / LLM-Reverse-Curriculum-RL

Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" presented by Zhiheng Xi et al.

☆116

Alternatives and similar repositories for LLM-Reverse-Curriculum-RL

Users that are interested in LLM-Reverse-Curriculum-RL are comparing it to the libraries listed below

Sorting:

huiwy / reflection-on-trees
View on GitHub
☆14May 9, 2024Updated last year
QizhiPei / MathFusion
View on GitHub
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)
☆35Jul 16, 2025Updated 7 months ago
open-thought / reasoning-gym-eval
View on GitHub
Collection of LLM completions for reasoning-gym task datasets
☆30Jul 4, 2025Updated 7 months ago
hbin0701 / Self-Explore
View on GitHub
[𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…
☆51May 4, 2024Updated last year
chujiezheng / LLM-Extrapolation
View on GitHub
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75May 20, 2025Updated 9 months ago
open-compass / MathBench
View on GitHub
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
☆111May 22, 2025Updated 9 months ago
AlignInc / aligner-replication
View on GitHub
The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
☆22May 29, 2024Updated last year
sarahmart / HARDMath
View on GitHub
A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low a…
☆26Feb 14, 2025Updated last year
THUDM / Self-Contrast
View on GitHub
Extensive Self-Contrast Enables Feedback-Free Language Model Alignment
☆21Apr 2, 2024Updated last year
vint-1 / dreamsmooth
View on GitHub
DreamSmooth: Improving Model-Based RL with Reward Smoothing (ICLR 2024)
☆12May 6, 2024Updated last year
Ryaang / EventRAG
View on GitHub
☆18Feb 16, 2025Updated last year
ictnlp / LevelRAG
View on GitHub
The official implementation of "LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented…
☆50Apr 12, 2025Updated 10 months ago
zhaoxlpku / SubgoalXL
View on GitHub
☆25Aug 23, 2024Updated last year
nick7nlp / FastCuRL
View on GitHub
FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning
☆57Oct 10, 2025Updated 4 months ago
eminorhan / llm-memory
View on GitHub
Memory experiments with LLMs
☆11Mar 31, 2023Updated 2 years ago
XueruiSu / Trust-Region-Preference-Approximation
View on GitHub
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
☆14Jun 28, 2025Updated 8 months ago
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
satori-reasoning / Satori
View on GitHub
[ICML 2025] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
☆108Jun 3, 2025Updated 8 months ago
zzhang0179 / Unveiling-Linguistic-Regions-in-LLMs
View on GitHub
[ACL 2024] Unveiling Linguistic Regions in Large Language Models
☆33Jun 9, 2024Updated last year
ezelikman / quiet-star
View on GitHub
Code for Quiet-STaR
☆741Aug 21, 2024Updated last year
hkust-nlp / dart-math
View on GitHub
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆120Dec 10, 2024Updated last year
XiaojuanTang / Mars
View on GitHub
a benchmark to evaluate the situated inductive reasoning
☆15Jan 7, 2025Updated last year
Infini-AI-Lab / M2PO
View on GitHub
☆29Oct 8, 2025Updated 4 months ago
zhaoxlpku / PromptCoT
View on GitHub
☆18Apr 10, 2025Updated 10 months ago
zepingyu0512 / arithmetic-mechanism
View on GitHub
code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
☆12Nov 17, 2024Updated last year
wenlinyao / HDFlow
View on GitHub
Code and data release of the paper Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
☆14Oct 4, 2024Updated last year
THUDM / ReST-MCTS
View on GitHub
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
☆692Jan 20, 2025Updated last year
Strong-AI-Lab / Logical-and-abstract-reasoning
View on GitHub
Evaluation on Logical Reasoning and Abstract Reasoning Challenges
☆29Apr 21, 2025Updated 10 months ago
jwhj / OREO
View on GitHub
☆116Jan 21, 2025Updated last year
LAMDASZ-ML / Self-Backtracking
View on GitHub
☆52Feb 12, 2025Updated last year
knoveleng / open-rs
View on GitHub
[AAAI 2026] - Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"
☆274Feb 20, 2026Updated last week
RUCAIBox / Slow_Thinking_with_LLMs
View on GitHub
A series of technical report on Slow Thinking with LLM
☆760Aug 13, 2025Updated 6 months ago
xfactlab / orpo
View on GitHub
Official repository for ORPO
☆471May 31, 2024Updated last year
rookie-joe / AutoPSV
View on GitHub
☆51Oct 28, 2024Updated last year
RLHFlow / Self-rewarding-reasoning-LLM
View on GitHub
Recipes to train the self-rewarding reasoning LLMs.
☆231Mar 2, 2025Updated 11 months ago
luchris429 / discovered-policy-optimisation
View on GitHub
Code for Discovered Policy Optimisation (NeurIPS 2022)
☆12Jun 15, 2023Updated 2 years ago
Fu-Dayuan / PreAct
View on GitHub
PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)
☆30Dec 12, 2024Updated last year
amazon-science / wikiwiki-dataset
View on GitHub
☆11May 11, 2022Updated 3 years ago
zz-haooo / LLMs-Preference-Optimization
View on GitHub
☆16May 31, 2024Updated last year