joeljang / knowledge-unlearning
[ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models
☆76Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for knowledge-unlearning
- ☆36Updated last year
- ☆13Updated last year
- ☆23Updated 2 months ago
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆66Updated 6 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆47Updated last month
- Restore safety in fine-tuned language models through task arithmetic☆26Updated 7 months ago
- ☆26Updated 6 months ago
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆65Updated 2 years ago
- [ICLR 2022] Towards Continual Knowledge Learning of Language Models☆93Updated 2 years ago
- ☆74Updated last year
- ☆24Updated last year
- ☆39Updated last year
- ☆24Updated 11 months ago
- ☆111Updated last year
- Unofficial re-implementation of "Trusting Your Evidence: Hallucinate Less with Context-aware Decoding"☆28Updated last year
- ☆16Updated 4 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆71Updated 6 months ago
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)☆12Updated last year
- 🤫 Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Con…☆34Updated 11 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆54Updated 10 months ago
- [NeurIPS 2022 Workshop] A Case Study with Negated Prompts using T0 (3B, 11B), InstructGPT (350M-175B), GPT-3 (350M - 175B) & OPT (125M - …☆23Updated 2 years ago
- ☆16Updated last year
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆15Updated 4 months ago
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆29Updated last year
- This repository contains the official code for the paper: "Prompt Injection: Parameterization of Fixed Inputs"☆32Updated 2 months ago
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975☆37Updated 11 months ago
- [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"☆58Updated 11 months ago
- ☆80Updated 2 years ago
- Official code implementation of SKU, Accepted by ACL 2024 Findings☆11Updated 6 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆57Updated 2 weeks ago