glorgao / SelectiveDPOLinks

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

☆44

Alternatives and similar repositories for SelectiveDPO

Users that are interested in SelectiveDPO are comparing it to the libraries listed below

Sorting:

ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆91Updated last year
tmlr-group / NoisyRationales
[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"
☆37Updated 3 months ago
tmlr-group / AR-Bench
[ICML 2025] "From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?"
☆45Updated 2 weeks ago
jinhaoduan / SAR
[ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
☆57Updated last year
guyuntian / CoT_benchmark
Code for "Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective"
☆20Updated 2 years ago
abhishekpanigrahi1996 / Skill-Localization-by-grafting
☆51Updated last year
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆85Updated 2 weeks ago
YangRui2015 / RiC
Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"
☆77Updated 4 months ago
hkust-nlp / PEM_composition
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Updated last year
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆44Updated 6 months ago
ChnQ / MI-Peaks
☆54Updated 3 months ago
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆85Updated 6 months ago
tmlr-group / ECON
[ICML 2025] "From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium"
☆26Updated 3 months ago
MiaoXiong2320 / llm-uncertainty
code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"
☆133Updated last year
srzer / MOD
Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".
☆27Updated 11 months ago
ZHZisZZ / emulated-disalignment
[ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
☆38Updated last year
junkangwu / beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆49Updated last year
nik-dim / tall_masks
Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]
☆51Updated last year
ZFancy / awesome-activation-engineering
A curated list of resources for activation engineering
☆107Updated 3 weeks ago
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆61Updated last year
jinzhuoran / RWKU
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆83Updated last year
uiuctml / Localize-and-Stitch
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
☆30Updated last month
ChnQ / TracingLLM
☆30Updated last year
2003pro / ScaleBiO
This is the official implementation of ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
☆21Updated last year
uw-nsl / safechain
[ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
☆20Updated 6 months ago
princeton-nlp / benign-data-breaks-safety
☆41Updated last year
EnnengYang / AdaMerging
AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.
☆92Updated 11 months ago
Lingkai-Kong / RE-Control
Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective
☆34Updated 8 months ago
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆157Updated 6 months ago
yaojin17 / Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"
☆60Updated last year