EIT-NLP / AccuracyParadox-RLHFLinks
[EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models". (by Yanjun Chen)
☆13Updated 8 months ago
Alternatives and similar repositories for AccuracyParadox-RLHF
Users that are interested in AccuracyParadox-RLHF are comparing it to the libraries listed below
Sorting:
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆10Updated 6 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 3 months ago
- Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)☆14Updated 7 months ago
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆24Updated 8 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆28Updated 7 months ago
- ☆14Updated last year
- ☆22Updated last year
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆62Updated 7 months ago
- Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lo…☆15Updated 8 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆52Updated 5 months ago
- ☆35Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆51Updated 2 months ago
- Repository for Skill Set Optimization☆14Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- ☆30Updated last year
- A framework for evolving and testing question-answering datasets with various models.☆16Updated last year
- [ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"☆25Updated last year
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- Codebase for Instruction Following without Instruction Tuning☆35Updated 10 months ago
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆25Updated last year
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆23Updated 11 months ago
- PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach☆32Updated last year
- [ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling☆16Updated 7 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Updated last year
- Self-Supervised Alignment with Mutual Information☆21Updated last year
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- This the implementation of LeCo☆31Updated 6 months ago
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆12Updated 11 months ago
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆26Updated 5 months ago
- ☆48Updated 9 months ago