GodXuxilie/PromptAttack

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GodXuxilie/PromptAttack)

GodXuxilie / PromptAttack

An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)

☆116

Alternatives and similar repositories for PromptAttack

Users that are interested in PromptAttack are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
zhliu0106 / learning-to-refuse
View on GitHub
Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"
☆10Dec 13, 2024Updated last year
yueliu1999 / FlipAttack
View on GitHub
[ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".
☆178May 2, 2025Updated last year
Yu-Fangxu / COLD-Attack
View on GitHub
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆176Dec 18, 2024Updated last year
ethz-spylab / unlearning-vs-safety
View on GitHub
☆27Oct 6, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
CryptoAILab / JailbreakEval
View on GitHub
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆193Apr 1, 2025Updated last year
EnnengYang / Efficient-WEMoE
View on GitHub
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.
☆16Oct 28, 2024Updated last year
arobey1 / advbench
View on GitHub
☆45Mar 3, 2023Updated 3 years ago
HelloEveryboby / Butler
View on GitHub
Butler 是一个用于自动化服务管理和任务调度的工具项目。
☆17Updated this week
boyiwei / CoTaEval
View on GitHub
[NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models
☆17Jul 17, 2024Updated 2 years ago
OPTML-Group / Unlearn-Simple
View on GitHub
[NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"
☆45Oct 3, 2025Updated 9 months ago
tmllab / 2025_ICLR_PiF
View on GitHub
☆40May 17, 2025Updated last year
OPTML-Group / WAGLE
View on GitHub
Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"
☆19Dec 16, 2024Updated last year
thunlp / Advbench
View on GitHub
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…
☆80Feb 19, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
GodXuxilie / RobustSSL_Benchmark
View on GitHub
Benchmark of robust self-supervised learning (RobustSSL) methods & Code for AutoLoRa (ICLR 2024).
☆19Dec 10, 2025Updated 7 months ago
LLM-Tuning-Safety / LLMs-Finetuning-Safety
View on GitHub
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆358Feb 23, 2024Updated 2 years ago
ericyinyzy / VLAttack
View on GitHub
This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2…
☆69Mar 22, 2025Updated last year
DAMO-NLP-SG / multilingual-safety-for-LLMs
View on GitHub
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"
☆106Mar 7, 2024Updated 2 years ago
SproutNan / AI-Safety_Benchmark
View on GitHub
The official repository for guided jailbreak benchmark
☆31Jul 28, 2025Updated 11 months ago
cookielee77 / CLARE
View on GitHub
Contextualized Perturbation for Textual Adversarial Attack, NAACL 2021
☆44Jul 23, 2021Updated 5 years ago
Babelscape / ALERT
View on GitHub
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
☆60Sep 20, 2024Updated last year
chawins / llm-sp
View on GitHub
Papers and resources related to the security and privacy of LLMs 🤖
☆579Jun 8, 2025Updated last year
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
microsoft / BIPIA
View on GitHub
A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.
☆148Apr 15, 2024Updated 2 years ago
DSN-2024 / DSN
View on GitHub
DSN jailbreak Attack & Evaluation Ensemble
☆17Feb 7, 2026Updated 5 months ago
OPTML-Group / SOUL
View on GitHub
Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"
☆30Oct 1, 2024Updated last year
llm-misinformation / llm-misinformation
View on GitHub
The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"
☆85Jul 11, 2026Updated 2 weeks ago
weizeming / momentum-attack-llm
View on GitHub
☆25Jan 17, 2025Updated last year
vinid / safety-tuned-llamas
View on GitHub
ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.
☆95May 9, 2024Updated 2 years ago
sduzpf / UAP_VLP
View on GitHub
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
☆24Aug 8, 2025Updated 11 months ago
1andrevich / antifilter-domain
View on GitHub
Generated geosite.dat based on Antifilter Community List
☆29Jul 19, 2026Updated last week
jaechan-repo / muse_bench
View on GitHub
☆33Aug 9, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
tsborkar / Selective-feature-regeneration
View on GitHub
Code/Models for Defending Against Universal Attacks Through Selective Feature Regeneration, CVPR 2020
☆10Jul 31, 2020Updated 5 years ago
uw-nsl / ArtPrompt
View on GitHub
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
☆102Aug 15, 2025Updated 11 months ago
patrickrchao / JailbreakingLLMs
View on GitHub
☆757Jul 2, 2025Updated last year
niklasrisse / LimitsOfML4Vuln
View on GitHub
☆26Feb 6, 2024Updated 2 years ago
princeton-nlp / benign-data-breaks-safety
View on GitHub
☆47Oct 1, 2024Updated last year
123000001212 / PoisonedEye
View on GitHub
Code of ICML 2025 paper "PoisonedEye: Knowledge Poisoning Attack on Retrieval-Augmented Generation based Large Vision-Language Models"
☆15Oct 30, 2025Updated 8 months ago
graldij / transformer-fusion
View on GitHub
Official repository of the "Transformer Fusion with Optimal Transport" paper, published as a conference paper at ICLR 2024.
☆31Apr 19, 2024Updated 2 years ago