π€« Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory"
β50Dec 20, 2023Updated 2 years ago
Alternatives and similar repositories for confaide
Users that are interested in confaide are comparing it to the libraries listed below
Sorting:
- β24Aug 18, 2023Updated 2 years ago
- [Preprint] On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shapingβ10Feb 27, 2020Updated 6 years ago
- Official code for ICML 2024 paper "Learning to Continually Learn with the Bayesian Principle"β20May 27, 2024Updated last year
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888β37Jun 10, 2024Updated last year
- Official repository of the paper: Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code (Findings of EACL β¦β12Feb 11, 2026Updated 2 weeks ago
- π» Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"β59May 31, 2024Updated last year
- Code for "CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples" (NDSS 2020)β22Nov 14, 2020Updated 5 years ago
- β19Mar 6, 2023Updated 2 years ago
- π€ Code for our EMNLP 2020 paper: "Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness"β37Oct 12, 2020Updated 5 years ago
- https://icml.cc/virtual/2023/poster/24354β10Aug 15, 2023Updated 2 years ago
- Official code for ACL 2023 (short, findings) paper "Recursion of Thought: A Divide and Conquer Approach to Multi-Context Reasoning with Lβ¦β45Jun 13, 2023Updated 2 years ago
- Code for paper: "RemovalNet: DNN model fingerprinting removal attack", IEEE TDSC 2023.β10Nov 27, 2023Updated 2 years ago
- The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word predβ¦β104Aug 13, 2024Updated last year
- β27Nov 20, 2023Updated 2 years ago
- [ICLR'24 Spotlight] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineerβ46May 30, 2024Updated last year
- The official pytorch implementation of ACM MM 19 paper "MetaAdvDet: Towards Robust Detection of Evolving Adversarial Attacks"β11Jun 7, 2021Updated 4 years ago
- Implementation of Self-supervised-Online-Adversarial-Purificationβ13Aug 2, 2021Updated 4 years ago
- Code for Findings of ACL 2021 "Differential Privacy for Text Analytics via Natural Text Sanitization"β32Mar 15, 2022Updated 3 years ago
- β28Nov 28, 2023Updated 2 years ago
- Code release for DeepJudge (S&P'22)β52Mar 14, 2023Updated 2 years ago
- [NeurIPS'22] Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork. Haotao Wang, Junyuan Hong,β¦β15Nov 27, 2023Updated 2 years ago
- [EMNLP 2022] Distillation-Resistant Watermarking (DRW) for Model Protection in NLPβ13Aug 17, 2023Updated 2 years ago
- β25Nov 14, 2022Updated 3 years ago
- Watermarking against model extraction attacks in MLaaS. ACM MM 2021.β34Jul 15, 2021Updated 4 years ago
- Submission Guide + Discussion Board for AI Singapore Online Safety Prize Challengeβ14Mar 20, 2024Updated last year
- [ACL 2021] Learning to Perturb Word Embeddings for Out-of-distribution QAβ16May 11, 2022Updated 3 years ago
- β16Jan 4, 2022Updated 4 years ago
- The official repository of the paper "On the Exploitability of Instruction Tuning".β70Feb 5, 2024Updated 2 years ago
- β13Oct 21, 2021Updated 4 years ago
- β17Nov 30, 2022Updated 3 years ago
- Code for ICLR 2025 Paper "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"β21Feb 10, 2025Updated last year
- The git repository of Modular Prompted Chatbot paperβ35May 24, 2023Updated 2 years ago
- Contrastive Chain-of-Thought Promptingβ68Nov 18, 2023Updated 2 years ago
- π§π» Code and benchmark for our Findings of ACL 2024 paper - "TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playingβ¦β21Dec 20, 2024Updated last year
- NLPCC-2025 Shared-Task 1: LLM-Generated Text Detectionβ15May 19, 2025Updated 9 months ago
- β18Oct 7, 2022Updated 3 years ago
- Private Adaptive Optimization with Side Information (ICML '22)β16Jun 23, 2022Updated 3 years ago
- Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]β21Apr 15, 2024Updated last year
- Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks (ICLR '20)β33Nov 4, 2020Updated 5 years ago