skywalker023 / confaide
π€« Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory"
β34Updated 11 months ago
Related projects β
Alternatives and complementary repositories for confaide
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Modelsβ76Updated 2 months ago
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Modelsβ15Updated 4 months ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888β36Updated 5 months ago
- β38Updated last year
- Restore safety in fine-tuned language models through task arithmeticβ26Updated 7 months ago
- Official code implementation of SKU, Accepted by ACL 2024 Findingsβ11Updated 6 months ago
- β12Updated 2 years ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"β47Updated last month
- β22Updated 11 months ago
- β31Updated last year
- RΓΆttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"β63Updated 10 months ago
- β36Updated last year
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022β27Updated 2 years ago
- Official Repository for Dataset Inference for LLMsβ23Updated 3 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deepβ28Updated 4 months ago
- SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviorsβ34Updated 4 months ago
- [ICLR 2024] Provable Robust Watermarking for AI-Generated Textβ26Updated 11 months ago
- β12Updated 3 months ago
- β20Updated last year
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Modelsβ66Updated 6 months ago
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156β27Updated 11 months ago
- Official Code Repository for the paper "Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-intensive Tasksβ¦β34Updated last month
- This repository contains the official code for the paper: "Prompt Injection: Parameterization of Fixed Inputs"β32Updated 2 months ago
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)β12Updated last year
- β16Updated 4 months ago
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)β21Updated 4 months ago
- β35Updated 4 months ago
- β49Updated last year
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975β37Updated 11 months ago
- β23Updated 2 months ago