SchwinnL / LLM_Embedding_AttackLinks

Code to conduct an embedding attack on LLMs

☆27

Alternatives and similar repositories for LLM_Embedding_Attack

Users that are interested in LLM_Embedding_Attack are comparing it to the libraries listed below

Sorting:

lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆91Updated last year
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆60Updated 11 months ago
weizeming / momentum-attack-llm
☆23Updated 8 months ago
ThuCCSLab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆173Updated 3 months ago
TrustAIRLab / VoiceJailbreakAttack
Code for Voice Jailbreak Attacks Against GPT-4o.
☆34Updated last year
thu-ml / Attack-Bard
☆101Updated last year
rotaryhammer / code-autodan
An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)
☆43Updated last year
euanong / image-hijacks
Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
☆50Updated 2 years ago
Sizhe-Chen / StruQ
official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries
☆45Updated 2 months ago
ChenWu98 / agent-attack
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆109Updated 7 months ago
OSU-NLP-Group / AmpleGCG
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆71Updated 11 months ago
chawins / pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆55Updated last year
XuanChen-xc / RLbreaker
Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)
☆12Updated 11 months ago
aengusl / latent-adversarial-training
☆43Updated last year
byerose / Awesome-Foundation-Model-Security
A curated list of trustworthy Generative AI papers. Daily updating...
☆74Updated last year
reds-lab / BEEAR
This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…
☆17Updated last year
umd-huang-lab / VLM-Poisoning
Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"
☆55Updated 8 months ago
BHui97 / PLeak
☆63Updated 9 months ago
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆99Updated 8 months ago
SaFoLab-WISC / JailBreakV_28K
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆77Updated 5 months ago
OSU-NLP-Group / EIA_against_webagent
☆35Updated last year
tmllab / 2025_ICLR_PiF
☆35Updated 4 months ago
Yu-Fangxu / COLD-Attack
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆166Updated 9 months ago
akumar2709 / OVERTHINK_public
☆40Updated 6 months ago
sail-sg / I-FSJ
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
☆65Updated 9 months ago
AI45Lab / ActorAttack
☆104Updated 8 months ago
xirui-li / DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆62Updated last year
AI45Lab / CodeAttack
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆52Updated last week
lapisrocks / rpo
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆58Updated last year
arobey1 / smooth-llm
☆109Updated last year