Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)
☆142Sep 21, 2024Updated last year
Alternatives and similar repositories for Google_SCoRe
Users that are interested in Google_SCoRe are comparing it to the libraries listed below
Sorting:
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆38Jul 9, 2025Updated 7 months ago
- ☆33Oct 31, 2024Updated last year
- hllama is a library which aims to provide a set of utility tools for large language models.☆10Apr 16, 2024Updated last year
- 카카오톡 GPT☆19Apr 9, 2024Updated last year
- Reward Model을 이용하여 언어모델의 답변을 평가하기☆29Feb 23, 2024Updated 2 years ago
- huggingface에 있는 한국어 데이터 세트☆36Oct 10, 2024Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆154Feb 3, 2025Updated last year
- Code for NAACL 2025 paper "AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge"☆16Updated this week
- NLP 역사부터 서빙까지 한 권의 책에서 다룹니다.☆24Dec 6, 2025Updated 2 months ago
- ChatGPT의 RLHF를 학습을 위한 3가지 step별 한국어 데이터셋☆40Nov 21, 2023Updated 2 years ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆83Jan 14, 2025Updated last year
- It shows a problem solver based on agentic workflow.☆16Mar 1, 2025Updated last year
- Full Stack SolarLLM Zero to All☆169Mar 1, 2025Updated last year
- 구글에서 발표한 Chain-of-Thought Reasoning without Prompting을 코드로 구현한 레포입니다.☆65Sep 28, 2024Updated last year
- 거꾸로 읽는 self-supervised learning in NLP☆27Oct 30, 2022Updated 3 years ago
- This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.☆17Sep 13, 2024Updated last year
- Topology Distillation for Recommender System (KDD'21)☆13Sep 2, 2021Updated 4 years ago
- SCoRe: Training Language Models to Self-Correct via Reinforcement Learning☆16Jan 24, 2025Updated last year
- ☆101Dec 22, 2023Updated 2 years ago
- The Universe of Evaluation. All about the evaluation for LLMs.☆232Jul 9, 2024Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- 프롬프트 엔지니어링을 하기 위한 공간을 만들어보고 싶었습니다.☆32Sep 10, 2024Updated last year
- Distilling Task-Specific Knowledge from Teacher Model into BiLSTM☆32Dec 14, 2024Updated last year
- ☆25Dec 12, 2025Updated 2 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆97Apr 9, 2025Updated 10 months ago
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆44Jul 8, 2025Updated 7 months ago
- ☆20Jul 24, 2024Updated last year
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- A recipe for online RLHF and online iterative DPO.☆540Dec 28, 2024Updated last year
- Evaluating Multimodal Generative AI with Korean Educational Standards, NAACL 2025.☆25May 15, 2025Updated 9 months ago
- ☆24Dec 2, 2023Updated 2 years ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆86Oct 26, 2025Updated 4 months ago
- A series of technical report on Slow Thinking with LLM☆760Aug 13, 2025Updated 6 months ago
- LLM의 다양한 튜닝 방법과 데이터 전처리 코드를 정리해놓았습니다.☆14Feb 23, 2026Updated last week
- ☆11May 18, 2022Updated 3 years ago
- GUIEvalKit: Open-source Evaluation Toolkit for GUI Agents☆19Updated this week
- ☆552Jan 2, 2025Updated last year
- PyTorch code for System-1.x: Learning to Balance Fast and Slow Planning with Language Models☆25Jul 22, 2024Updated last year
- ☆23Dec 17, 2024Updated last year