TrustAIRLab / JailbreakLLMsLinks
A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
☆15Updated last year
Alternatives and similar repositories for JailbreakLLMs
Users that are interested in JailbreakLLMs are comparing it to the libraries listed below
Sorting:
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆22Updated 10 months ago
- ☆14Updated last year
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs☆12Updated last year
- 🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access☆52Updated 8 months ago
- ☆37Updated last year
- A curated list of Meachine learning Security & Privacy papers published in security top-4 conferences (IEEE S&P, ACM CCS, USENIX Security…☆327Updated 2 months ago
- A list of recent papers about adversarial learning☆304Updated this week
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆49Updated last year
- The automated prompt injection framework for LLM-integrated applications.☆253Updated last year
- ☆78Updated last year
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆564Updated last year
- TAP: An automated jailbreaking method for black-box LLMs☆217Updated last year
- Code for paper "The Philosopher’s Stone: Trojaning Plugins of Large Language Models"☆26Updated last year
- Safety at Scale: A Comprehensive Survey of Large Model Safety☆224Updated 2 months ago
- [USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities agai…☆56Updated 10 months ago
- ☆224Updated 5 months ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆113Updated last year
- This is the source code for Data-free Backdoor. Our paper is accepted by the 32nd USENIX Security Symposium (USENIX Security 2023).☆34Updated 2 years ago
- This repository provides a benchmark for prompt injection attacks and defenses in LLMs☆384Updated 3 months ago
- PFI: Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents☆26Updated 10 months ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆66Updated last year
- A list of recent adversarial attack and defense papers (including those on large language models)☆46Updated 2 weeks ago
- A curated list of papers & resources linked to data poisoning, backdoor attacks and defenses against them (no longer maintained)☆286Updated last year
- Papers and resources related to the security and privacy of LLMs 🤖☆559Updated 7 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆191Updated 7 months ago
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆35Updated 2 years ago
- ☆27Updated 4 years ago
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models☆265Updated last year
- ☆19Updated last year
- Code for Voice Jailbreak Attacks Against GPT-4o.☆36Updated last year