sanowl/Self-Correcting-LLM--Reinforcement-Learning-

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sanowl/Self-Correcting-LLM--Reinforcement-Learning-)

sanowl / Self-Correcting-LLM--Reinforcement-Learning-

This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by google

☆37

Alternatives and similar repositories for Self-Correcting-LLM--Reinforcement-Learning-

Users that are interested in Self-Correcting-LLM--Reinforcement-Learning- are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

daje0601 / Google_SCoRe
View on GitHub
Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)
☆141Sep 21, 2024Updated last year
ChiyuSONG / dynamics-of-instruction-tuning
View on GitHub
☆18Mar 3, 2025Updated last year
genrm-star / genrm-critiques
View on GitHub
GenRM-CoT: Data release for verification rationales
☆68Oct 16, 2024Updated last year
gpengzhi / Bi-SimCut
View on GitHub
Code for NAACL 2022 main conference paper "Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation"
☆12May 8, 2023Updated 3 years ago
hyunseoklee-ai / ReMoDetect
View on GitHub
ReMoDetect: Reward Models Recognize Aligned LLM's Generations (NeurIPS 2024)
☆17Nov 15, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
CarperAI / Algorithm-Distillation-RLHF
View on GitHub
☆35Jan 29, 2023Updated 3 years ago
morning9393 / ETPO
View on GitHub
☆14Mar 5, 2024Updated 2 years ago
HanNight / AdaCAD
View on GitHub
Code for NAACL 2025 paper "AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge"
☆16Mar 2, 2026Updated 4 months ago
kuc2477 / pytorch-memn2n
View on GitHub
PyTorch implementation of FAIR's paper "End-to-End Memory Network", NIPS 2015
☆12Oct 19, 2017Updated 8 years ago
vipulgupta1011 / CALM
View on GitHub
☆11Oct 2, 2023Updated 2 years ago
purbeshmitra / MOTIF
View on GitHub
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
☆17Jul 6, 2025Updated last year
RUCBM / AtomMem
View on GitHub
☆27Mar 31, 2026Updated 3 months ago
tianyi-lab / R2-T2
View on GitHub
[ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"
☆19Mar 10, 2025Updated last year
YifeiZhou02 / ArCHer
View on GitHub
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆208Apr 17, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
NishanthVAnand / prediction-and-control-in-continual-reinforcement-learning
View on GitHub
Code to reproduce results from the paper: Prediction and Control in Continual Reinforcement Learning, NeurIPS 2023.
☆13May 10, 2024Updated 2 years ago
max-andr / adversarial-random-search-gpt4
View on GitHub
Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]
☆43Apr 28, 2024Updated 2 years ago
Unbabel / smaug
View on GitHub
Python package to augment multilingual data
☆15Feb 15, 2023Updated 3 years ago
am-bean / lingOly
View on GitHub
A benchmark for language models based on the UK Linguistics Olympiad
☆12Mar 3, 2025Updated last year
BUPT-ANTlab / PEPCRL-MVP
View on GitHub
☆17Oct 25, 2023Updated 2 years ago
yuki-younai / Jailbreak-R1
View on GitHub
offical implementation of Jailbreak-R1
☆15Jul 16, 2025Updated last year
PeterPopma / unityfc
View on GitHub
Script for Unity soccer game
☆15Jul 17, 2022Updated 4 years ago
tpoisonooo / open-r1
View on GitHub
Fully open reproduction of DeepSeek-R1
☆11Mar 24, 2025Updated last year
RLHFlow / GVM
View on GitHub
☆16Jul 29, 2025Updated 11 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
feyzaakyurek / rl4f
View on GitHub
Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.
☆63Nov 27, 2024Updated last year
zhengzx-nlp / past-and-future-nmt
View on GitHub
Implementation of "Modeling Past and Future for Neural Machine Translation"
☆15Mar 16, 2018Updated 8 years ago
CSSLab / ThinkTwice
View on GitHub
Jointly Optimizing Large Language Models for Reasoning and Self-Refinement
☆15Apr 22, 2026Updated 3 months ago
ZurichNLP / understanding-mbr
View on GitHub
☆17Apr 28, 2022Updated 4 years ago
microsoft / RLHF-APA
View on GitHub
RL algorithm: Advantage induced policy alignment
☆66Aug 11, 2023Updated 2 years ago
awwang10 / llmpromptboosting
View on GitHub
Accompanying code for "Boosted Prompt Ensembles for Large Language Models"
☆31Apr 13, 2023Updated 3 years ago
MurtyShikhar / structural-grokking
View on GitHub
Code for our ACL '23 paper titled "Grokking of Hierarchical Structure in Vanilla Transformers"
☆26Oct 8, 2023Updated 2 years ago
apexrl / CoDAIL
View on GitHub
Implementation of CoDAIL in the ICLR 2020 paper <Multi-Agent Interactions Modeling with Correlated Policies>
☆19Jun 17, 2021Updated 5 years ago
hanningzhang / prm
View on GitHub
☆17Nov 3, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
kschweig / OfflineRL
View on GitHub
Experiment for Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning
☆26Jan 16, 2023Updated 3 years ago
thu-rllab / LaRe
View on GitHub
[AAAI-25] Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning.
☆34May 29, 2025Updated last year
ZhaolinGao / REFUEL
View on GitHub
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
☆25Oct 8, 2024Updated last year
john-hewitt / model-editing-canonical-examples
View on GitHub
☆14Feb 12, 2024Updated 2 years ago
alecwangcq / f-divergence-dpo
View on GitHub
Direct preference optimization with f-divergences.
☆17Nov 3, 2024Updated last year
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆460Mar 20, 2026Updated 4 months ago