joeljang / knowledge-unlearningLinks

[ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models

☆84

Alternatives and similar repositories for knowledge-unlearning

Users that are interested in knowledge-unlearning are comparing it to the libraries listed below

Sorting:

SALT-NLP / chain-of-thought-bias
☆28Updated last year
SALT-NLP / Efficient_Unlearning
☆38Updated 2 years ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 3 years ago
hongshi97 / CAD
Unofficial re-implementation of "Trusting Your Evidence: Hallucinate Less with Context-aware Decoding"
☆32Updated 11 months ago
snw2021 / LLM_Unlearning_Papers
☆26Updated last year
jaehunjung1 / Maieutic-Prompting
☆50Updated 2 years ago
CharlesYu2000 / PCGU-UnlearningBias
☆17Updated 2 years ago
dannyallover / overthinking_the_truth
☆29Updated last year
joeljang / temporalwiki
[EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models
☆74Updated last year
launchnlp / LitCab
☆25Updated 5 months ago
McGill-NLP / bias-bench
ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.
☆150Updated 2 months ago
declare-lab / resta
Restore safety in fine-tuned language models through task arithmetic
☆29Updated last year
Nanami18 / Snowballed_Hallucination
☆44Updated last year
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆60Updated last year
mt-upc / logit-explanations
☆14Updated 2 years ago
princeton-nlp / MABEL
EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975
☆38Updated last year
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆117Updated last year
GXimingLu / Quark
☆75Updated 2 years ago
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆83Updated 8 months ago
EmpathYang / ADEPT
Source code and data for ADEPT: A DEbiasing PrompT Framework (AAAI-23).
☆15Updated 11 months ago
katiekang1998 / llm_hallucinations
☆17Updated last year
snu-mllab / Bayesian-Red-Teaming
About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)
☆15Updated 2 years ago
vinid / safety-tuned-llamas
ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.
☆88Updated last year
joeljang / continual-knowledge-learning
[ICLR 2022] Towards Continual Knowledge Learning of Language Models
☆92Updated 3 years ago
balevinstein / Probes
☆57Updated 2 years ago
aviclu / ffn-values
☆67Updated 2 years ago
roeehendel / icl_task_vectors
☆101Updated 2 years ago
dongjunKANG / VIM
☆10Updated 2 years ago
nayeon7lee / FactualityPrompt
☆86Updated 3 years ago
Alrope123 / rethinking-demonstrations
☆176Updated last year