[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
☆53Jul 28, 2024Updated last year
Alternatives and similar repositories for DPO-ST
Users that are interested in DPO-ST are comparing it to the libraries listed below
Sorting:
- Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue (ACL Findings 2023)☆21Nov 10, 2025Updated 3 months ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Feb 9, 2026Updated 3 weeks ago
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated last year
- ☆33Jan 6, 2025Updated last year
- (ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training☆13Feb 15, 2025Updated last year
- PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach☆32Nov 6, 2023Updated 2 years ago
- PyTorch implementation of CARE☆16Oct 6, 2023Updated 2 years ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Aug 30, 2024Updated last year
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆21Oct 8, 2024Updated last year
- Agent-RRM: Exploring Reasoning Reward Model for Agents☆49Updated this week
- Code for the paper "Cottention: Linear Transformers With Cosine Attention"☆20Nov 15, 2025Updated 3 months ago
- Control LLM☆22Apr 6, 2025Updated 10 months ago
- ☆17Feb 4, 2025Updated last year
- [ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"☆18Jun 1, 2024Updated last year
- DPO, but faster 🚀☆48Dec 6, 2024Updated last year
- ☆72Jun 10, 2025Updated 8 months ago
- ☆18Feb 7, 2021Updated 5 years ago
- Codes for Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback (ACL 2024 Findings)☆16Jul 2, 2024Updated last year
- Pytorch implementation of Planar Flow☆17Dec 2, 2019Updated 6 years ago
- ☆16Jul 23, 2024Updated last year
- Source code of our EMNLP 2024 paper "FactAlign: Long-form Factuality Alignment of Large Language Models"☆19Oct 3, 2024Updated last year
- ☆15Feb 21, 2024Updated 2 years ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆26Aug 9, 2025Updated 6 months ago
- [ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆23Mar 29, 2025Updated 11 months ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆392Jan 19, 2025Updated last year
- [ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"☆30Jan 10, 2026Updated last month
- memory-efficient fine-tuning; support 24G GPU memory fine-tuning 7B☆21May 26, 2024Updated last year
- A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…☆20Nov 21, 2024Updated last year
- ☆19Mar 10, 2025Updated 11 months ago
- ☆21Aug 9, 2024Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆47Apr 15, 2025Updated 10 months ago
- ☆17Jul 3, 2017Updated 8 years ago
- ☆24Oct 21, 2024Updated last year
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆21Apr 2, 2024Updated last year
- [ICCV25] TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers☆41Jul 23, 2025Updated 7 months ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆27Oct 3, 2025Updated 5 months ago
- Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inte…☆22Jun 3, 2024Updated last year
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models☆60Jul 23, 2024Updated last year