CyberAgentAILab / filtered-dpoLinks
Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model
☆16Updated 9 months ago
Alternatives and similar repositories for filtered-dpo
Users that are interested in filtered-dpo are comparing it to the libraries listed below
Sorting:
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆27Updated 2 months ago
- Directional Preference Alignment☆59Updated last year
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆36Updated last year
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆49Updated last month
- ☆14Updated last year