BY571 / SCoReLinks

SCoRe: Training Language Models to Self-Correct via Reinforcement Learning

☆13

Alternatives and similar repositories for SCoRe

Users that are interested in SCoRe are comparing it to the libraries listed below

Sorting:

uservan / ThinkPO
☆17Updated 3 months ago
Gen-Verse / CURE
[NeurIPS 2025 Spotlight] ReasonFlux-Coder: Open-Source LLM Coders with Co-Evolving Reinforcement Learning
☆133Updated 2 months ago
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆57Updated last year
GuanghaoYe / Emergence-of-Thinking
☆53Updated 9 months ago
GraphPKU / Case_or_Rule
exploring whether LLMs perform case-based or rule-based reasoning
☆30Updated last year
ZhaolinGao / REFUEL
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
☆24Updated last year
janphilippfranken / sami
Self-Supervised Alignment with Mutual Information
☆21Updated last year
SIMONLQY / RethinkMCTS
☆30Updated last year
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
likenneth / dialogue_action_token
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
☆28Updated last year
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86Updated 6 months ago
LAMDASZ-ML / Self-Backtracking
☆51Updated 9 months ago
Ablustrund / APPS_Plus
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
☆73Updated last year
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆112Updated 4 months ago
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
☆72Updated last year
thu-coai / SPaR
☆46Updated 5 months ago
Reason-Wang / NAT
[NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…
☆29Updated last year
zitian-gao / SC-MCTS
Interpretable Contrastive Monte Carlo Tree Search Reasoning
☆48Updated last year
Jiahao004 / DeepTheorem
☆24Updated 5 months ago
facebookresearch / dualformer
implementation of dualformer
☆24Updated 8 months ago
haotiansun14 / BBox-Adapter
Lightweight Adapting for Black-Box Large Language Models
☆24Updated last year
cmu-mind / RISE
☆33Updated last year
Asap7772 / understanding-rlhf
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆32Updated last year
waterhorse1 / Natural-language-RL
Natural Language Reinforcement Learning
☆100Updated 4 months ago
MLE-Dojo / MLE-Dojo
☆80Updated last month
scaleapi / plansearch
e
☆41Updated 7 months ago
tsinghua-fib-lab / SmartAgent
The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".
☆27Updated 3 months ago
mathllm / Step-Controlled_DPO
☆23Updated last year
weizhepei / WebAgent-R1
[EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
☆60Updated 3 weeks ago
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆112Updated 10 months ago