liushunyu / awesome-direct-preference-optimizationLinks
A Survey of Direct Preference Optimization (DPO)
☆73Updated last month
Alternatives and similar repositories for awesome-direct-preference-optimization
Users that are interested in awesome-direct-preference-optimization are comparing it to the libraries listed below
Sorting:
- ☆102Updated 2 months ago
- AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)☆293Updated last month
- Paper List of Inference/Test Time Scaling/Computing☆286Updated last month
- ☆155Updated 2 months ago
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆277Updated last month
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆81Updated 5 months ago
- ☆252Updated last month
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆83Updated 8 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆120Updated last month
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆171Updated last week
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆158Updated 3 months ago
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆96Updated last month
- (ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents☆81Updated 6 months ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆142Updated 2 weeks ago
- One-shot Entropy Minimization☆175Updated last month
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆38Updated 2 months ago
- A curated list of Awesome-LLM-Ensemble papers for the survey "Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"☆105Updated last month
- ☆323Updated last week
- OpenReivew Submission Visualization (ICLR 2024/2025)☆151Updated 9 months ago
- Extrapolating RLVR to General Domains without Verifiers☆134Updated last week
- A research repo for experiments about Reinforcement Finetuning☆49Updated 4 months ago
- A collection of papers on discrete diffusion models☆156Updated last month
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆173Updated 5 months ago
- ☆140Updated 2 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆88Updated 7 months ago
- RFTT: Reasoning with Reinforced Functional Token Tuning☆28Updated last month
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆147Updated 2 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆51Updated 2 months ago
- [arXiv 2025] Efficient Reasoning Models: A Survey☆247Updated 2 weeks ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆146Updated 3 weeks ago