TrustAI-laboratory / Many-Shot-Jailbreaking-DemoLinks
Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechanisms of LLMs, including those developed by Anthropic and other leading AI organizations. Resources
☆11Updated 11 months ago
Alternatives and similar repositories for Many-Shot-Jailbreaking-Demo
Users that are interested in Many-Shot-Jailbreaking-Demo are comparing it to the libraries listed below
Sorting:
- A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).☆12Updated last year
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆19Updated 3 months ago
- [Usenix Security 2025] Official repo of paper PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs☆14Updated last month
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆320Updated 5 months ago
- Process the data from this bluetooth module in python from PC☆10Updated 6 years ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆88Updated last year
- This is the official repository for the code used in the paper: "What Was Your Prompt? A Remote Keylogging Attack on AI Assistants", USEN…☆52Updated 5 months ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆72Updated last year
- Code for "Biometric Backdoors: A Poisoning Attack Against Unsupervised Template Updating"☆11Updated 3 years ago
- ☆34Updated 9 months ago
- Security Attacks on LLM-based Code Completion Tools (AAAI 2025)☆20Updated 2 months ago
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs☆11Updated 8 months ago
- Jailbreak artifacts for JailbreakBench☆60Updated 8 months ago
- ☆91Updated last year
- This repository provides a benchmark for prompt Injection attacks and defenses☆245Updated last month
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆53Updated 10 months ago
- Code for the paper "Watermarking Makes Language Models Radioactive"☆17Updated 8 months ago
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆372Updated 3 months ago
- AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks☆51Updated last month
- TAP: An automated jailbreaking method for black-box LLMs☆176Updated 7 months ago
- Basic 2G sms and voice calls with a LimeNET Micro v2.1 and the osmocom nitb stack☆11Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆162Updated 3 months ago
- ☆186Updated 3 months ago
- Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the Over…☆13Updated last year
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆156Updated 3 weeks ago
- A 2.4GHz band and WiFi analyzer toolkit made with the D1 Mini and NRF24L01☆38Updated 2 years ago
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆39Updated 6 months ago
- ☆573Updated 2 weeks ago
- Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, data…☆792Updated last week
- ☆11Updated last year