joeljang / knowledge-unlearning
[ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models
☆80Updated 6 months ago
Alternatives and similar repositories for knowledge-unlearning:
Users that are interested in knowledge-unlearning are comparing it to the libraries listed below
- ☆16Updated last year
- ☆38Updated last year
- ☆25Updated 6 months ago
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆66Updated 2 years ago
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆70Updated 10 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆78Updated 10 months ago
- ☆75Updated last year
- 🤫 Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Con…☆40Updated last year
- ☆44Updated 6 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆58Updated last year
- ☆16Updated last year
- ☆25Updated last year
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆56Updated 5 months ago
- ☆30Updated 10 months ago
- ☆20Updated last week
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆69Updated 2 weeks ago
- ☆47Updated last year
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Updated 8 months ago
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156☆30Updated last year
- Restore safety in fine-tuned language models through task arithmetic☆27Updated 11 months ago
- Unofficial re-implementation of "Trusting Your Evidence: Hallucinate Less with Context-aware Decoding"☆28Updated 4 months ago
- This repository contains the official code for the paper: "Prompt Injection: Parameterization of Fixed Inputs"☆32Updated 6 months ago
- ☆10Updated last year
- LoFiT: Localized Fine-tuning on LLM Representations☆34Updated 2 months ago
- ☆22Updated 5 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆52Updated 4 months ago
- ☆42Updated last month
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆32Updated last year
- ☆50Updated 8 months ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆93Updated last month