EIT-NLP / AccuracyParadox-RLHF

[EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models". (by Yanjun Chen)
12Updated 3 weeks ago

Related projects

Alternatives and complementary repositories for AccuracyParadox-RLHF