weixuan-wang123 / SADILinks

☆13

Alternatives and similar repositories for SADI

Users that are interested in SADI are comparing it to the libraries listed below

Sorting:

deeplearning-wisc / picle
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
☆26Updated last year
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆62Updated last year
hkust-nlp / PEM_composition
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Updated last year
dannyallover / overthinking_the_truth
☆29Updated last year
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆69Updated 3 years ago
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆67Updated 11 months ago
declare-lab / resta
Restore safety in fine-tuned language models through task arithmetic
☆29Updated last year
jinzhuoran / RWKU
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆86Updated last year
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆37Updated 5 months ago
SALT-NLP / Efficient_Unlearning
☆38Updated 2 years ago
Thartvigsen / GRACE
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
☆82Updated 11 months ago
yihuaihong / ConceptVectors
[EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
☆38Updated 3 months ago
princeton-nlp / benign-data-breaks-safety
☆41Updated last year
ECNU-ICALK / MELO
[AAAI 2024] MELO: Enhancing Model Editing with Neuron-indexed Dynamic LoRA
☆25Updated last year
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆69Updated last year
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆118Updated last year
launchnlp / LitCab
☆25Updated 5 months ago
javiferran / sae_entities
☆63Updated 8 months ago
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆47Updated last year
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆84Updated 8 months ago
yizhongw / llm-temporal-alignment
Methods and evaluation for aligning language models temporally
☆30Updated last year
roeehendel / icl_task_vectors
☆101Updated 2 years ago
YJiangcm / LTE
[ACL 2024] Learning to Edit: Aligning LLMs with Knowledge Editing
☆36Updated last year
abhishekpanigrahi1996 / Skill-Localization-by-grafting
☆51Updated last year
avalonstrel / Mitigating-the-Alignment-Tax-of-RLHF
☆15Updated last year
ADaM-BJTU / W2SG
The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”
☆17Updated last year
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆121Updated last year
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆43Updated 10 months ago
mt-upc / logit-explanations
☆14Updated 2 years ago
licong-lin / negative-preference-optimization
☆68Updated last year