thu-coai / Targeted-Data-Extraction
Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation"
☆22Updated last year
Alternatives and similar repositories for Targeted-Data-Extraction:
Users that are interested in Targeted-Data-Extraction are comparing it to the libraries listed below
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆35Updated 9 months ago
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022☆29Updated 2 years ago
- ☆41Updated last month
- ☆18Updated 3 years ago
- Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer"☆42Updated 2 years ago
- ☆53Updated 9 months ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Updated 7 months ago
- ☆25Updated last year
- ☆11Updated 2 years ago
- ☆21Updated last year
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆47Updated 4 months ago
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆48Updated 10 months ago
- Mostly recording papers about models' trustworthy applications. Intending to include topics like model evaluation & analysis, security, c…☆20Updated last year
- [TACL] Code for "Red Teaming Language Model Detectors with Language Models"☆19Updated last year
- ☆24Updated 3 years ago
- Updated 9 months ago
- Official implementation of the EMNLP 2021 paper "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks"☆32Updated 3 years ago
- ☆23Updated last year
- [ICLR'24 Spotlight] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer☆38Updated 9 months ago
- ☆37Updated last year
- Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-…☆38Updated 3 years ago
- Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"☆36Updated 8 months ago
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models☆27Updated last year
- ☆25Updated 5 months ago
- codes for "Searching for an Effective Defender:Benchmarking Defense against Adversarial Word Substitution"☆31Updated last year
- ☆33Updated last year
- Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"☆17Updated 8 months ago
- A Survey of Hallucination in Large Foundation Models☆54Updated last year
- Official Repository for Dataset Inference for LLMs☆32Updated 7 months ago