skywalker023 / confaideLinks
π€« Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory"
β46Updated last year
Alternatives and similar repositories for confaide
Users that are interested in confaide are comparing it to the libraries listed below
Sorting:
- β46Updated 8 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Modelsβ83Updated last year
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"β60Updated last year
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)β15Updated 2 years ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888β36Updated last year
- Restore safety in fine-tuned language models through task arithmeticβ29Updated last year
- β38Updated 2 years ago
- β38Updated 2 years ago
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"β32Updated last year
- β41Updated last year
- β43Updated 2 years ago
- [ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMsβ47Updated 5 months ago
- [EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"β37Updated 2 months ago
- β23Updated 2 years ago
- β13Updated 3 years ago
- Official Repository for Dataset Inference for LLMsβ41Updated last year
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156β40Updated last year
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptorsβ81Updated 10 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.β83Updated 7 months ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescueβ37Updated 5 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.β87Updated last year
- β28Updated last year
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Modelsβ17Updated last year
- β29Updated last year
- Code for watermarking language modelsβ82Updated last year
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)β26Updated last year
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024β85Updated last year
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Modelβ68Updated 3 years ago
- RΓΆttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"β116Updated 8 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuningβ99Updated last year