skywalker023 / confaideLinks
π€« Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory"
β44Updated last year
Alternatives and similar repositories for confaide
Users that are interested in confaide are comparing it to the libraries listed below
Sorting:
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888β36Updated last year
- β44Updated 6 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Modelsβ82Updated 11 months ago
- β39Updated 2 years ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"β59Updated 11 months ago
- Restore safety in fine-tuned language models through task arithmeticβ28Updated last year
- β41Updated 10 months ago
- β38Updated last year
- β13Updated 2 years ago
- [ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMsβ43Updated 3 months ago
- β22Updated 2 years ago
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)β26Updated last year
- Official Repository for Dataset Inference for LLMsβ37Updated last year
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"β27Updated last year
- RΓΆttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"β110Updated 6 months ago
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022β30Updated 3 years ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescueβ35Updated 3 months ago
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)β15Updated 2 years ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptorsβ79Updated 8 months ago
- [EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"β36Updated last week
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuningβ97Updated last year
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.β87Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.β76Updated 5 months ago
- β28Updated 11 months ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Modelsβ17Updated last year
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156β36Updated last year
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluaβ¦β34Updated 2 years ago
- β26Updated last year
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024β78Updated 11 months ago
- The official repository of the paper "On the Exploitability of Instruction Tuning".β64Updated last year