schauppi / Self-Rewarding-Language-ModelsLinks

☆47

Alternatives and similar repositories for Self-Rewarding-Language-Models

Users that are interested in Self-Rewarding-Language-Models are comparing it to the libraries listed below

Sorting:

sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆51Updated 11 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆77Updated 6 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
thomasgauthier / LLM-self-play
Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)
☆29Updated last year
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
☆71Updated last year
dinobby / MAgICoRE
☆23Updated last year
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆108Updated 4 months ago
SalesforceAIResearch / LaTRO
☆122Updated 8 months ago
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆108Updated 3 months ago
SALT-NLP / demonstrated-feedback
☆128Updated last year
ytyz1307zzh / RefAug
Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"
☆55Updated last year
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆75Updated last year
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆111Updated 9 months ago
Oxen-AI / Self-Rewarding-Language-Models
This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.
☆130Updated 11 months ago
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆90Updated last year
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆59Updated last year
para-lost / ReBase
ReBase: Training Task Experts through Retrieval Based Distillation
☆29Updated 8 months ago
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆115Updated 8 months ago
architsharma97 / dpo-rlaif
☆100Updated last year
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆64Updated 8 months ago
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆93Updated 5 months ago
arcee-ai / DAM
☆55Updated 11 months ago
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆42Updated last year
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆48Updated 7 months ago
allenai / IFBench
☆83Updated last week
sunblaze-ucb / reasoning_ladder
☆35Updated 5 months ago
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆82Updated last year
allenai / clin
☆86Updated last year
LLM360 / crystalcoder-data-prep
Data preparation code for CrystalCoder 7B LLM
☆45Updated last year
thu-coai / SPaR
☆46Updated 4 months ago