brian-lou / Training-Data-Extraction-Attack-on-LLMsLinks
This project explores training data extraction attacks on the LLaMa 7B, GPT-2XL, and GPT-2-IMDB models to discover memorized content using perplexity, perturbation scoring metrics, and large scale search queries.
☆15Updated 2 years ago
Alternatives and similar repositories for Training-Data-Extraction-Attack-on-LLMs
Users that are interested in Training-Data-Extraction-Attack-on-LLMs are comparing it to the libraries listed below
Sorting:
- AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks☆56Updated 5 months ago
 - Whispers in the Machine: Confidentiality in Agentic Systems☆41Updated 3 weeks ago
 - The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word pred…☆101Updated last year
 - [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆172Updated 7 months ago
 - Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the Over…☆13Updated 2 years ago
 - ☆65Updated 10 months ago
 - LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins☆28Updated last year
 - Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆64Updated last year
 - LLM Self Defense: By Self Examination, LLMs know they are being tricked☆43Updated last year
 - [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆162Updated 6 months ago
 - PAL: Proxy-Guided Black-Box Attack on Large Language Models☆55Updated last year
 - Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆94Updated last year
 - Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"☆22Updated last year
 - ☆109Updated 6 months ago
 - Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"☆23Updated last year
 - This repository provides a benchmark for prompt injection attacks and defenses☆318Updated this week
 - Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆359Updated 9 months ago
 - TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆74Updated 2 months ago
 - [Findings of ACL 2023] Bridge the Gap Between CV and NLP! A Optimization-based Textual Adversarial Attack Framework.☆13Updated 2 years ago
 - The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆141Updated 2 months ago
 - ☆153Updated last year
 - Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆152Updated 11 months ago
 - [ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`☆90Updated 2 months ago
 - Code for Findings-ACL 2023 paper: Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Rec…☆48Updated last year
 - ☆43Updated 2 years ago
 - A curated list of trustworthy Generative AI papers. Daily updating...☆75Updated last year
 - A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆87Updated last year
 - ☆47Updated 7 months ago
 - LLM security and privacy☆51Updated last year
 - [ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models☆41Updated last year