sssth / awesome-DPO
papers related to Direct Preference Optimization(DPO)
☆16Updated 8 months ago
Alternatives and similar repositories for awesome-DPO:
Users that are interested in awesome-DPO are comparing it to the libraries listed below
- The latest progress of Personalized Large Language Models (LLMs).☆14Updated last week
- Awesome RL-based LLM Reasoning☆341Updated last week
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆91Updated this week
- Yelp Simulator for WWW'25 AgentSociety Challenge☆73Updated 3 weeks ago
- ☆117Updated 2 weeks ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆153Updated this week
- [NeurIPS 2024] The implementation of paper "On Softmax Direct Preference Optimization for Recommendation"☆65Updated 4 months ago
- A curated list of personalized alignment resources (continually updated).☆12Updated this week
- ☆33Updated last year
- ☆28Updated 6 months ago
- Paper list for Efficient Reasoning.☆311Updated this week
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆72Updated 7 months ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆93Updated 11 months ago
- ☆44Updated 4 months ago
- Paper List of Inference/Test Time Scaling/Computing☆127Updated last week
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆179Updated last year
- ☆105Updated 6 months ago
- ☆48Updated last month
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆65Updated this week
- Repo of "Large Language Model-based Human-Agent Collaboration for Complex Task Solving(EMNLP2024 Findings)"☆31Updated 6 months ago
- ☆81Updated 2 months ago
- ☆22Updated last week
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆37Updated 3 months ago
- ☆14Updated 6 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆41Updated 2 weeks ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆124Updated 3 months ago
- ☆186Updated this week
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆49Updated 8 months ago
- code for GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation☆15Updated 3 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆119Updated 8 months ago