tpai / gandalf-prompt-injection-writeup
A writeup for the Gandalf prompt injection game.
☆36Updated last year
Alternatives and similar repositories for gandalf-prompt-injection-writeup:
Users that are interested in gandalf-prompt-injection-writeup are comparing it to the libraries listed below
- ☆38Updated 3 weeks ago
- My inputs for the LLM Gandalf made by Lakera☆41Updated last year
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆264Updated last month
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆79Updated 9 months ago
- [Corca / ML] Automatically solved Gandalf AI with LLM☆48Updated last year
- Fine-tuning base models to build robust task-specific models☆27Updated 10 months ago
- Payloads for Attacking Large Language Models☆74Updated 7 months ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆451Updated 4 months ago
- LLM security and privacy☆47Updated 4 months ago
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆283Updated 4 months ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆58Updated 10 months ago
- ☆34Updated 3 months ago
- This repository provides implementation to formalize and benchmark Prompt Injection attacks and defenses☆172Updated last month
- Turning Gandalf against itself. Use LLMs to automate playing Lakera Gandalf challenge without needing to set up an account with a platfor…☆29Updated last year
- Tools and our test data developed for the HackAPrompt 2023 competition☆30Updated last year
- Whispers in the Machine: Confidentiality in LLM-integrated Systems☆33Updated 2 weeks ago
- Risks and targets for assessing LLMs & LLM vulnerabilities☆30Updated 8 months ago
- ☆497Updated 2 months ago
- Tree of Attacks (TAP) Jailbreaking Implementation☆99Updated last year
- TAP: An automated jailbreaking method for black-box LLMs☆145Updated 2 months ago
- ☆64Updated last month
- [ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`☆59Updated 2 months ago
- A benchmark for prompt injection detection systems.☆96Updated 2 weeks ago
- Dropbox LLM Security research code and results☆220Updated 9 months ago
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆298Updated 4 months ago
- Curation of prompts that are known to be adversarial to large language models☆179Updated 2 years ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆66Updated last year
- Codebase of https://arxiv.org/abs/2410.14923☆44Updated 4 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆49Updated 6 months ago