skywalker023 / confaide
π€« Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory"
β34Updated 10 months ago
Related projects β
Alternatives and complementary repositories for confaide
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888β36Updated 5 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Modelsβ76Updated 2 months ago
- Restore safety in fine-tuned language models through task arithmeticβ26Updated 7 months ago
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022β27Updated 2 years ago
- β38Updated last year
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Modelsβ15Updated 3 months ago
- Official Repository for Dataset Inference for LLMsβ23Updated 3 months ago
- β35Updated last year
- β12Updated 2 years ago
- β31Updated last year
- β48Updated last year
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975β37Updated 10 months ago
- Influence Experimentsβ35Updated last year
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"β45Updated last month
- β20Updated last year
- β15Updated 3 months ago
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Modelsβ66Updated 5 months ago
- β44Updated 2 months ago
- RΓΆttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"β61Updated 10 months ago
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluaβ¦β29Updated last year
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)β12Updated last year
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]β14Updated 6 months ago
- [ICLR 2024] Provable Robust Watermarking for AI-Generated Textβ26Updated 11 months ago
- β33Updated last year
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescueβ32Updated 3 weeks ago
- Min-K%++: Improved baseline for detecting pre-training data of LLMs https://arxiv.org/abs/2404.02936β26Updated 5 months ago
- Data Valuation on In-Context Examples (ACL23)β23Updated 3 weeks ago
- β23Updated last month
- β24Updated 11 months ago
- Official code implementation of SKU, Accepted by ACL 2024 Findingsβ11Updated 5 months ago