architsharma97 / dpo-rlaifLinks

☆100

Alternatives and similar repositories for dpo-rlaif

Users that are interested in dpo-rlaif are comparing it to the libraries listed below

Sorting:

IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆168Updated last month
SynthLabsAI / big-math
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆65Updated 7 months ago
ScalerLab / JudgeBench
☆102Updated 11 months ago
SalesforceAIResearch / LaTRO
☆122Updated 8 months ago
openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆201Updated last year
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 5 months ago
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆59Updated last year
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆107Updated 2 months ago
ucl-dark / llm_debate
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
☆117Updated last year
shenao-zhang / SELM
The official implementation of Self-Exploring Language Models (SELM)
☆64Updated last year
da03 / Internalize_CoT_Step_by_Step
☆195Updated 6 months ago
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆47Updated 7 months ago
hbin0701 / Self-Explore
[𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…
☆51Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated last year
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆123Updated 11 months ago
kyegomez / Lets-Verify-Step-by-Step
"Improving Mathematical Reasoning with Process Supervision" by OPENAI
☆111Updated this week
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆64Updated 8 months ago
SALT-NLP / demonstrated-feedback
☆128Updated last year
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆62Updated last year
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆181Updated 6 months ago
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
Linear95 / SPAG
Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024
☆140Updated 8 months ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆142Updated last year
meg-tong / sycophancy-eval
datasets from the paper "Towards Understanding Sycophancy in Language Models"
☆94Updated last year
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆57Updated last year
facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆47Updated last year
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆42Updated last year
jwhj / OREO
☆116Updated 9 months ago
joeljang / RLPHF
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
☆110Updated 2 years ago