yinyueqin / relative-preference-optimization
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
☆21Updated 11 months ago
Alternatives and similar repositories for relative-preference-optimization:
Users that are interested in relative-preference-optimization are comparing it to the libraries listed below
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆42Updated 6 months ago
- A Self-Training Framework for Vision-Language Reasoning☆63Updated 3 weeks ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆54Updated 2 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆23Updated 4 months ago
- ☆58Updated last month
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆63Updated 3 months ago
- 🚀LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆69Updated 2 months ago
- A Survey on the Honesty of Large Language Models☆53Updated 2 months ago
- Mosaic IT: Enhancing Instruction Tuning with Data Mosaics☆17Updated last week
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆33Updated 3 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆42Updated 3 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆66Updated 6 months ago
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆66Updated last year
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆39Updated 3 months ago
- ☆94Updated last year
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆65Updated last year
- [EMNLP Findings 2024 & ACL 2024 NLRSE Oral] Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards☆48Updated 9 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆68Updated 8 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆63Updated 8 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆49Updated 4 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆48Updated 2 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆35Updated 4 months ago
- Directional Preference Alignment☆56Updated 4 months ago
- ☆32Updated last year
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆55Updated last month
- CLIP-MoE: Mixture of Experts for CLIP☆23Updated 4 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆43Updated 6 months ago
- ☆28Updated 3 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆45Updated 3 months ago
- ☆13Updated 7 months ago