[ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"
☆30Jan 10, 2026Updated last month
Alternatives and similar repositories for alpha-DPO
Users that are interested in alpha-DPO are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆42Sep 18, 2025Updated 5 months ago
- ☆13Jan 22, 2025Updated last year
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆14Jun 6, 2025Updated 8 months ago
- Aligning Agentic World Models via Knowledgeable Experience Learning☆31Jan 25, 2026Updated last month
- Official implementation for our paper: Rethinking Video Tokenization: A Conditioned Diffusion-based Approach☆14Apr 2, 2025Updated 11 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆50Oct 23, 2024Updated last year
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆33Sep 28, 2025Updated 5 months ago
- CS194-196 Course Project☆14Feb 20, 2025Updated last year
- KMM: Key Frame Mask Mamba for Extended Motion Generation☆19Sep 22, 2025Updated 5 months ago
- [ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"☆18Jun 1, 2024Updated last year
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Mar 31, 2025Updated 11 months ago
- ☆19Mar 25, 2025Updated 11 months ago
- ☆41Jan 4, 2026Updated 2 months ago
- [ACL 2025 Findings] Text2World: Benchmarking Large Language Models for Symbolic World Model Generation☆28Feb 25, 2025Updated last year
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆20Apr 9, 2025Updated 10 months ago
- [EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality☆19Mar 4, 2025Updated 11 months ago
- A simple implementation of ReasonGenRM.☆19Apr 21, 2025Updated 10 months ago
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆20Feb 26, 2025Updated last year
- Code, Data and Model for Paper "Learning from Peers in Reasoning Models"☆27May 13, 2025Updated 9 months ago
- Unifying Specialized Visual Encoders for Video Language Models☆25Nov 22, 2025Updated 3 months ago
- BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models☆40Oct 30, 2025Updated 4 months ago
- Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆23Mar 18, 2025Updated 11 months ago
- ☆17Aug 1, 2025Updated 7 months ago
- ☆23Feb 18, 2025Updated last year
- ☆31Sep 12, 2025Updated 5 months ago
- ☆21Feb 22, 2026Updated last week
- ☆21Aug 30, 2025Updated 6 months ago
- ☆26Jun 22, 2024Updated last year
- Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our co…☆25Dec 20, 2024Updated last year
- We introduce EMMET and unify model editing with popular algorithms ROME and MEMIT.☆25Dec 16, 2024Updated last year
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆21Apr 2, 2024Updated last year
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆39Jan 5, 2026Updated last month
- Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment [ICCV 2025] - Official implementation☆42Aug 5, 2025Updated 6 months ago
- ☆33Nov 18, 2025Updated 3 months ago
- ☆60Jan 12, 2026Updated last month
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 24, 2026Updated last week
- ☆24Nov 16, 2023Updated 2 years ago
- Experiments on the impact of depth in transformers and SSMs.☆40Oct 23, 2025Updated 4 months ago