skywalker023 / confaideLinks
π€« Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory"
β42Updated last year
Alternatives and similar repositories for confaide
Users that are interested in confaide are comparing it to the libraries listed below
Sorting:
- β44Updated 4 months ago
- β13Updated 2 years ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Modelsβ81Updated 9 months ago
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)β15Updated last year
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888β35Updated last year
- β36Updated 2 years ago
- β38Updated last year
- Official code implementation of SKU, Accepted by ACL 2024 Findingsβ15Updated 6 months ago
- Restore safety in fine-tuned language models through task arithmeticβ28Updated last year
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"β59Updated 8 months ago
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022β29Updated 2 years ago
- β35Updated 6 months ago
- β21Updated last year
- β22Updated 3 months ago
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156β33Updated last year
- Official Repository for Dataset Inference for LLMsβ34Updated 11 months ago
- β41Updated 8 months ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Modelsβ17Updated 11 months ago
- β26Updated 9 months ago
- Code for the WWW'23 paper "Sanitizing Sentence Embeddings (and Labels) for Local Differential Privacy"β12Updated 2 years ago
- [ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMsβ39Updated last month
- Code for "Universal Adversarial Triggers Are Not Universal."β17Updated last year
- β26Updated last year
- β27Updated last year
- Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"β36Updated 11 months ago
- Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidβ¦β23Updated 2 years ago
- β54Updated 2 years ago
- Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.β50Updated last year
- Official repo for the paper: Recovering Private Text in Federated Learning of Language Models (in NeurIPS 2022)β56Updated 2 years ago
- This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.β90Updated last year