LiuAmber / RAHFLinks

[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.acl-long.572/)

☆28

Alternatives and similar repositories for RAHF

Users that are interested in RAHF are comparing it to the libraries listed below

Sorting:

hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆62Updated last year
RUCAIBox / HaluEval-2.0
☆47Updated last year
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆67Updated 11 months ago
tatsu-lab / test_set_contamination
☆41Updated 2 years ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆62Updated 11 months ago
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆69Updated last year
YJiangcm / LTE
[ACL 2024] Learning to Edit: Aligning LLMs with Knowledge Editing
☆36Updated last year
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆118Updated last year
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆121Updated last year
dannyallover / overthinking_the_truth
☆29Updated last year
edenbiran / RippleEdits
Evaluating the Ripple Effects of Knowledge Editing in Language Models
☆55Updated last year
deeplearning-wisc / picle
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
☆26Updated last year
hahahawu / Long-to-Short-via-Model-Merging
Model merging is a highly efficient approach for long-to-short reasoning.
☆89Updated last month
BunsenFeng / AbstainQA
AbstainQA, ACL 2024
☆28Updated last year
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆99Updated last year
GAIR-NLP / alignment-for-honesty
☆76Updated last year
javiferran / sae_entities
☆63Updated 8 months ago
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆67Updated 4 months ago
pillowsofwind / Knowledge-Conflicts-Survey
[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"
☆148Updated last year
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆69Updated 3 years ago
ShujinWu-0814 / ALOE
Public code repo for COLING 2025 paper "Aligning LLMs with Individual Preferences via Interaction"
☆38Updated 7 months ago
princeton-nlp / benign-data-breaks-safety
☆41Updated last year
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆76Updated last year
ADaM-BJTU / W2SG
The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”
☆17Updated last year
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆132Updated 8 months ago
Zhou-Zoey / RMB-Reward-Model-Benchmark
☆45Updated 7 months ago
jinzhuoran / RWKU
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆86Updated last year
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆47Updated last year
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆37Updated 5 months ago
OpenBMB / CPO
☆23Updated last year