Oxen-AI/Self-Rewarding-Language-Models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Oxen-AI/Self-Rewarding-Language-Models)

Oxen-AI / Self-Rewarding-Language-Models

This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.

☆135

Alternatives and similar repositories for Self-Rewarding-Language-Models

Users that are interested in Self-Rewarding-Language-Models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

schauppi / Self-Rewarding-Language-Models
View on GitHub
☆50May 13, 2024Updated 2 years ago
lucidrains / self-rewarding-lm-pytorch
View on GitHub
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
☆1,411Apr 11, 2024Updated 2 years ago
open-thought / reasoning-gym-eval
View on GitHub
Collection of LLM completions for reasoning-gym task datasets
☆31Jul 4, 2025Updated last year
keven980716 / weak-to-strong-deception
View on GitHub
[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"
☆15Jun 21, 2024Updated 2 years ago
sail-sg / dice
View on GitHub
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆47Apr 15, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
init0xyz / AdaCQR
View on GitHub
Implementation of AdaCQR(COLING 2025)
☆15Dec 30, 2024Updated last year
nirgreshler / bayesian-online-planning
View on GitHub
The code for the paper "A Bayesian Approach to Online Planning" published in ICML 2024.
☆13Jun 17, 2024Updated 2 years ago
McGill-NLP / topiocqa
View on GitHub
Code and data for reproducing baselines for TopiOCQA, an open-domain conversational question-answering dataset
☆57Nov 15, 2023Updated 2 years ago
eric-mitchell / concord
View on GitHub
☆14Nov 15, 2022Updated 3 years ago
SparkJiao / dpo-trajectory-reasoning
View on GitHub
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆84Jan 14, 2025Updated last year
ellenmellon / INSCIT
View on GitHub
INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions
☆16Jan 21, 2025Updated last year
grill-lab / CIS-Tutorial-SIGIR2022
View on GitHub
Repository for SIGIR 2022 CIS tutorial
☆20Jul 11, 2022Updated 4 years ago
princeton-pli / what-makes-good-rm
View on GitHub
[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆44Sep 18, 2025Updated 10 months ago
Somnef / snake_neat_ai
View on GitHub
☆16Nov 12, 2022Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
THUDM / LongReward
View on GitHub
☆63Oct 29, 2024Updated last year
d223302 / Over-Reasoning-of-LLMs
View on GitHub
Data and code for EACL'24 paper: Over-Reasoning and Redundant Calculation of Large Language Models
☆11Jan 23, 2024Updated 2 years ago
uclaml / SPIN
View on GitHub
The official implementation of Self-Play Fine-Tuning (SPIN)
☆1,247May 8, 2024Updated 2 years ago
ElleLeonne / Lightning-ReLoRA
View on GitHub
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆34Mar 2, 2024Updated 2 years ago
keyshor / spectrl_tool
View on GitHub
Learning algorithm implementation and experiments in the paper "A Composable Specification Language for Reinforcement Learning Tasks" (ht…
☆18Nov 23, 2020Updated 5 years ago
thomasgauthier / LLM-self-play
View on GitHub
Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)
☆29Mar 1, 2024Updated 2 years ago
KbsdJames / Omni-MATH
View on GitHub
The official repository of the Omni-MATH benchmark.
☆94Dec 22, 2024Updated last year
quao627 / Awesome-Diffusion-Language-Models
View on GitHub
☆35Jul 2, 2025Updated last year
ai-wand / concise-reasoning
View on GitHub
Concise Reasoning via Reinforcement Learning
☆13Apr 16, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
bodaay / HuggingChatAllInOne
View on GitHub
One Repo To Quickly Build One Docker File for HuggingChat Front and BackEnd
☆26Jul 5, 2023Updated 3 years ago
MARIO-Math-Reasoning / Super_MARIO
View on GitHub
☆341Jun 5, 2025Updated last year
zankner / CLoud
View on GitHub
Critique-out-Loud Reward Models
☆76Oct 18, 2024Updated last year
niconi19 / Emergent-Response-Planning-in-LLMs
View on GitHub
[ICML 2025] Emergent Response Planning in LLMs
☆20Jul 1, 2025Updated last year
pranavAL / DART
View on GitHub
Official Code Repo for the paper "Learning to Play Atari in a World of Tokens" accepted at ICML, 2024
☆11Jun 6, 2024Updated 2 years ago
ezelikman / quiet-star
View on GitHub
Code for Quiet-STaR
☆739Aug 21, 2024Updated last year
idoh / fast_mamba.np
View on GitHub
A pure and fast NumPy implementation of Mamba with cache support.
☆18Jun 16, 2024Updated 2 years ago
KbsdJames / omni-math-rule
View on GitHub
The rule-based evaluation subset and code implementation of Omni-MATH
☆28Dec 23, 2024Updated last year
phonism / CP-Zero
View on GitHub
Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.
☆18Apr 22, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
JIA-Lab-research / Step-DPO
View on GitHub
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
☆398Jan 19, 2025Updated last year
GAIR-NLP / ReasonEval
View on GitHub
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆80Oct 9, 2025Updated 9 months ago
UKPLab / acl2025-diverse-cot
View on GitHub
Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"
☆32Jun 25, 2025Updated last year
wangcunxiang / Graph-aS-Tokens
View on GitHub
☆10Nov 29, 2024Updated last year
francescortu / comp-mech
View on GitHub
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals; ACL 2024
☆13May 24, 2024Updated 2 years ago
yizhongw / llm-temporal-alignment
View on GitHub
Methods and evaluation for aligning language models temporally
☆31Mar 2, 2024Updated 2 years ago
rycolab / kl-rb
View on GitHub
This repository contains code for the paper "Better Estimation of the KL Divergence Between Language Models"
☆19May 30, 2025Updated last year