EIT-NLP / AccuracyParadox-RLHFLinks

[EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models". (by Yanjun Chen)

☆13

Alternatives and similar repositories for AccuracyParadox-RLHF

Users that are interested in AccuracyParadox-RLHF are comparing it to the libraries listed below

Sorting:

UCSB-NLP-Chang / Prereq_tune
Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"
☆10Updated 6 months ago
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 3 months ago
WindyLee0822 / CTG
Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)
☆14Updated 7 months ago
DynaMath / DynaMath
A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
☆24Updated 8 months ago
Fu-Dayuan / PreAct
PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)
☆28Updated 7 months ago
googleinterns / localizing-paragraph-memorization
☆14Updated last year
mathllm / Step-Controlled_DPO
☆22Updated last year
ZHZisZZ / weak-to-strong-search
[NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
☆62Updated 7 months ago
CyberAgentAILab / filtered-dpo
Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lo…
☆15Updated 8 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆52Updated 5 months ago
RUCAIBox / BAMBOO
☆35Updated last year
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆51Updated 2 months ago
allenai / sso
Repository for Skill Set Optimization
☆14Updated last year
limenlp / safer-instruct
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Updated last year
ValueCompass / Alignment-Goal-Survey
☆30Updated last year
NanshineLoong / Self-Evolving-Benchmark
A framework for evolving and testing question-answering datasets with various models.
☆16Updated last year
halfrot / ALaRM
[ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"
☆25Updated last year
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated 10 months ago
DRSY / KV_Compression
[EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens
☆25Updated last year
iwangjian / Midi-Tuning
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)
☆23Updated 11 months ago
wangjs9 / Aligned-dPM
PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach
☆32Updated last year
MozerWang / DEMO
[ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling
☆16Updated 7 months ago
Yifan-Song793 / GoodBadGreedy
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
☆30Updated last year
janphilippfranken / sami
Self-Supervised Alignment with Mutual Information
☆21Updated last year
wwxu21 / CUT
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Updated last year
starrYYxuan / LeCo
This the implementation of LeCo
☆31Updated 6 months ago
debjitpaul / Causal_CoT
About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…
☆12Updated 11 months ago
yihedeng9 / DuoGuard
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
☆26Updated 5 months ago
rookie-joe / AutoPSV
☆48Updated 9 months ago