Linear95 / DSPLinks

Domain-specific preference (DSP) data and customized RM fine-tuning.

☆25

Alternatives and similar repositories for DSP

Users that are interested in DSP are comparing it to the libraries listed below

Sorting:

swtheing / PF-PPO-RLHF
☆34Updated last year
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆56Updated last year
YuxiXie / SelfEval-Guided-Decoding
☆103Updated 2 years ago
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆58Updated last year
ValueCompass / Alignment-Goal-Survey
☆30Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆124Updated last year
Shentao-YANG / Preference_Grounded_Guidance
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆17Updated 11 months ago
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆63Updated last year
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆57Updated last year
Re-Align / AlignTDS
Analyzing LLM Alignment via Token distribution shift
☆17Updated last year
FreedomIntelligence / OVM
☆68Updated last year
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆51Updated 6 months ago
FranxYao / FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆132Updated 2 years ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆66Updated last year
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆76Updated last month
princeton-nlp / WhatICLLearns
[ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
☆20Updated 2 years ago
liziniu / GEM
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆48Updated 6 months ago
RZFan525 / Awesome-ScalingLaws
A curated list of awesome resources dedicated to Scaling Laws for LLMs
☆80Updated 2 years ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆76Updated 6 months ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆148Updated 9 months ago
joeljang / RLPHF
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
☆111Updated 2 years ago
chang-github-00 / LLM-Predictive-Decoding
☆14Updated 4 months ago
ADaM-BJTU / W2SG
The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”
☆17Updated last year
rookie-joe / AutoPSV
☆51Updated last year
TianHongZXY / CoRe
[ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)
☆49Updated last year
GuanghaoYe / Emergence-of-Thinking
☆53Updated 9 months ago
gl-ybnbxb / BoNBoN
☆18Updated last year
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆69Updated 8 months ago
junkangwu / beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆49Updated last year