Lyz1213 / BadEditLinks

☆36

Alternatives and similar repositories for BadEdit

Users that are interested in BadEdit are comparing it to the libraries listed below

Sorting:

inspire-group / RobustRAG
☆21Updated last year
SolidShen / RIPPLE_official
☆20Updated last year
rotaryhammer / code-autodan
An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)
☆44Updated last year
byerose / Awesome-Foundation-Model-Security
A curated list of trustworthy Generative AI papers. Daily updating...
☆75Updated last year
PKU-ML / PAT
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆21Updated 6 months ago
CryptoAILab / misalignment
[NDSS'25] The official implementation of safety misalignment.
☆17Updated 10 months ago
cnut1648 / Model-Fingerprint
Fingerprint large language models
☆44Updated last year
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆43Updated last year
wegodev2 / virtual-prompt-injection
Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆26Updated last year
thu-coai / JailbreakDefense_GoalPriority
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Updated last year
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆99Updated last year
wang2226 / Trojan-Activation-Attack
[CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.
☆29Updated last year
SproutNan / AI-Safety_SCAV
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆46Updated last month
WUSTL-CSPL / LLMJailbreak
☆37Updated last year
lancopku / codable-watermarking-for-llm
Repository for Towards Codable Watermarking for Large Language Models
☆38Updated 2 years ago
AI45Lab / ActorAttack
☆112Updated 10 months ago
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
☆66Updated 8 months ago
AI45Lab / CodeAttack
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆55Updated 2 months ago
qingjiesjtu / USC
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
☆63Updated 11 months ago
CryptoAILab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆180Updated 5 months ago
NY1024 / Foundation-Model-Paper-Notes
☆70Updated 6 months ago
yuplin2333 / representation-space-jailbreak
Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…
☆22Updated last year
Gwinhen / DRUPE
Distribution Preserving Backdoor Attack in Self-supervised Learning
☆20Updated last year
thunlp / HiddenKiller
Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"
☆43Updated 3 years ago
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆218Updated 2 weeks ago
AI-secure / MMDT
Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
☆24Updated 8 months ago
RUCAIBox / HADES
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆33Updated last year
OSU-NLP-Group / AmpleGCG
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆77Updated last year
BHui97 / PLeak
☆70Updated 11 months ago
OSU-NLP-Group / EIA_against_webagent
☆36Updated last year