brian-lou / Training-Data-Extraction-Attack-on-LLMsLinks

This project explores training data extraction attacks on the LLaMa 7B, GPT-2XL, and GPT-2-IMDB models to discover memorized content using perplexity, perturbation scoring metrics, and large scale search queries.

☆14

Alternatives and similar repositories for Training-Data-Extraction-Attack-on-LLMs

Users that are interested in Training-Data-Extraction-Attack-on-LLMs are comparing it to the libraries listed below

Sorting:

XHMY / AutoDefense
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
☆47Updated 2 weeks ago
poloclub / llm-self-defense
LLM Self Defense: By Self Examination, LLMs know they are being tricked
☆34Updated last year
wegodev2 / virtual-prompt-injection
Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆18Updated 11 months ago
arobey1 / advbench
☆43Updated 2 years ago
xirui-li / DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆52Updated 9 months ago
LostOxygen / llm-confidentiality
Whispers in the Machine: Confidentiality in Agentic Systems
☆39Updated 2 weeks ago
safr-ai-lab / survey-llm
A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant code
☆67Updated last year
ZiyueWang25 / llm-security-challenge
Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the Over…
☆13Updated last year
chawins / pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆51Updated 9 months ago
microsoft / analysing_pii_leakage
The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word pred…
☆96Updated 9 months ago
AI-secure / AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆128Updated last month
Phantivia / T-PGD
[Findings of ACL 2023] Bridge the Gap Between CV and NLP! A Optimization-based Textual Adversarial Attack Framework.
☆13Updated last year
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆35Updated 10 months ago
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆79Updated 8 months ago
TrustAIResearch / MLHospital
☆44Updated 2 years ago
verazuo / prompt-stealing-attack
[USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models
☆37Updated 4 months ago
JonasGeiping / carving
Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives
☆69Updated last year
papersPapers / BadPrompt
Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"
☆36Updated 11 months ago
LukasStruppek / Plug-and-Play-Attacks
[ICML 2022 / ICLR 2024] Source code for our papers "Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks" and "Be C…
☆42Updated 10 months ago
byerose / Awesome-Foundation-Model-Security
A curated list of trustworthy Generative AI papers. Daily updating...
☆73Updated 9 months ago
aengusl / latent-adversarial-training
☆39Updated 8 months ago
phycholosogy / RAG-privacy
The code for paper "The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)", exploring the privacy risk o…
☆48Updated 4 months ago
amazon-science / controlling-llm-memorization
☆36Updated 2 years ago
Vaidehi99 / InfoDeletionAttacks
☆44Updated 4 months ago
arobey1 / smooth-llm
☆98Updated last year
JailbreakBench / artifacts
Jailbreak artifacts for JailbreakBench
☆60Updated 7 months ago
BHui97 / PLeak
☆57Updated 5 months ago
pasquini-dario / LLM_NeuralExec
Code to generate NeuralExecs (prompt injection for LLMs)
☆22Updated 6 months ago
zihao-ai / unthinking_vulnerability
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
☆30Updated 2 weeks ago
allenai / wildguard
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆79Updated 6 months ago