wagner-group/prompt-injection-defense

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wagner-group/prompt-injection-defense)

wagner-group / prompt-injection-defense

Fine-tuning base models to build robust task-specific models

☆36

Alternatives and similar repositories for prompt-injection-defense

Users that are interested in prompt-injection-defense are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alphadl / SafeLLM_with_IntentionAnalysis
View on GitHub
Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting
☆21Mar 25, 2024Updated 2 years ago
thestephencasper / explore_establish_exploit_llms
View on GitHub
☆31Jul 14, 2023Updated 3 years ago
jiah-li / magic
View on GitHub
The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.
☆15Dec 16, 2024Updated last year
serendipity1122 / Pre-trained-Model-Guided-Fine-Tuning-for-Zero-Shot-Adversarial-Robustness
View on GitHub
Code repository for CVPR2024 paper 《Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness》
☆25May 29, 2024Updated 2 years ago
facebookresearch / jailbreak-objectives
View on GitHub
Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"
☆37Jul 2, 2026Updated 2 weeks ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
huizhang-L / CodeChameleon
View on GitHub
☆30Mar 20, 2024Updated 2 years ago
Aatrox103 / SAP
View on GitHub
☆49May 9, 2024Updated 2 years ago
aivillage / llm_verification
View on GitHub
LLM prompt attacks for hacker CTFs via CTFd.
☆15Dec 17, 2023Updated 2 years ago
SheltonLiu-N / AutoDAN
View on GitHub
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆453Jan 22, 2025Updated last year
kriti-hippo / red_queen
View on GitHub
Red Queen Dataset and data generation template
☆26Dec 26, 2025Updated 6 months ago
kimdanny / Fair-RAG
View on GitHub
ICTIR 2025 "Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation"
☆15Sep 19, 2024Updated last year
jiaxiaojunQAQ / FGSM-PGI
View on GitHub
Code for Prior-Guided Adversarial Initialization for Fast Adversarial Training (ECCV2022)
☆28Nov 25, 2022Updated 3 years ago
Algorithmic-Alignment-Lab / CommonClaim
View on GitHub
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
☆15Jun 21, 2023Updated 3 years ago
UCLAIS / ml-tutorials-season-3
View on GitHub
Materials for season 3 (2022/23) of the UCL Artificial Intelligence Society's machine learning tutorial series
☆12Mar 8, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
YihanWang617 / llm-jailbreaking-defense
View on GitHub
A lightweight library for large laguage model (LLM) jailbreaking defense.
☆61Sep 11, 2025Updated 10 months ago
liu00222 / Open-Prompt-Injection
View on GitHub
This repository provides a benchmark for prompt injection attacks and defenses in LLMs
☆465Oct 29, 2025Updated 8 months ago
UCSB-NLP-Chang / SemanticSmooth
View on GitHub
Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'
☆24Jun 9, 2024Updated 2 years ago
real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
Princeton-SysML / Jailbreak_LLM
View on GitHub
☆203Nov 26, 2023Updated 2 years ago
LLMSecurity / HouYi
View on GitHub
The automated prompt injection framework for LLM-integrated applications.
☆269Sep 12, 2024Updated last year
YihanWang617 / LLM-Jailbreaking-Defense-Backtranslation
View on GitHub
Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"
☆34Aug 16, 2024Updated last year
pasquini-dario / LLM_NeuralExec
View on GitHub
Code to generate NeuralExecs (prompt injection for LLMs)
☆27Oct 5, 2025Updated 9 months ago
shinington / facesec
View on GitHub
Corresponding code to "FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems" @ CVPR 2021
☆13Jun 22, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
PKU-ML / PAT
View on GitHub
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆22May 6, 2025Updated last year
qiuhuachuan / latent-jailbreak
View on GitHub
☆39May 21, 2024Updated 2 years ago
locuslab / acr-memorization
View on GitHub
☆41Dec 19, 2024Updated last year
DAMO-NLP-SG / multilingual-safety-for-LLMs
View on GitHub
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"
☆106Mar 7, 2024Updated 2 years ago
Alibaba-AAIG / Strata-Sword
View on GitHub
The Strata-Sword is a hierarchical Chinese-English jailbreak safety benchmark based on quantified reasoning complexity, developed in-hous…
☆22Sep 3, 2025Updated 10 months ago
shinington / Robust-PDF-Classifier-with-Conserved-Features
View on GitHub
Corresponding code to "Improving Robustness of ML Classifiers against Realizable Evasion Attacks Using Conserved Features" @ USENIX Secur…
☆11Aug 5, 2019Updated 6 years ago
Yuning-J / NVDFeatureAnalysis
View on GitHub
Correlate NVD datasets wIth CWE/CAPEC/CVSS labels for customised usage. Plus static analysis and data visualisation.
☆13Nov 17, 2023Updated 2 years ago
Lancern / llvm-anderson
View on GitHub
Anderson points-to analysis implementation based on LLVM
☆12Jan 3, 2021Updated 5 years ago
OSU-NLP-Group / AmpleGCG
View on GitHub
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆87Nov 3, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
lyh6560new / P3Sum
View on GitHub
The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"
☆10Jun 23, 2024Updated 2 years ago
zhaoyiran924 / Probe-Sampling
View on GitHub
[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling
☆35Nov 8, 2024Updated last year
DavidFanzz / llm_decoding
View on GitHub
☆12Apr 25, 2025Updated last year
SheltonLiu-N / Universal-Prompt-Injection
View on GitHub
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆73Oct 23, 2024Updated last year
patrickrchao / JailbreakingLLMs
View on GitHub
☆756Jul 2, 2025Updated last year
shuzhenggao / ICL4code
View on GitHub
☆13Aug 9, 2023Updated 2 years ago
SprocketLab / roboshot
View on GitHub
☆24May 30, 2024Updated 2 years ago