dangne / tmd

[EMNLP'22] Textual Manifold-based Defense Against Natural Language Adversarial Examples

☆11

Alternatives and similar repositories for tmd:

Users that are interested in tmd are comparing it to the libraries listed below

Jayfeather1024 / DensePure
☆18Updated last year
rmin2000 / adv_tracing
Identification of the Adversary from a Single Adversarial Example (ICML 2023)
☆9Updated 8 months ago
JinyiW / GuidedDiffusionPur
☆54Updated last year
UCSB-NLP-Chang / SelfDenoise
☆13Updated 10 months ago
RockyLzy / TextDefender
codes for "Searching for an Effective Defender:Benchmarking Defense against Adversarial Word Substitution"
☆31Updated last year
NYU-DICE-Lab / circumventing-concept-erasure
☆19Updated last year
itsvaibhav01 / Immune
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆10Updated 3 weeks ago
UCSC-VLAA / AdvXL
[CVPR 2024] This repository includes the official implementation our paper "Revisiting Adversarial Training at Scale"
☆19Updated 11 months ago
inspire-group / proxy-distributions
[ICLR 2022 official code] Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?
☆29Updated 3 years ago
dugu9sword / dne
ACL 2021 - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble
☆17Updated last year
haonan3 / ICML-2024-Oral-SilentBadDiffusion
☆11Updated 4 months ago
lancopku / SOS
Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)
☆23Updated 3 years ago
Haochen-Luo / CroPA
☆40Updated 3 months ago
YitingQu / unsafe-diffusion
☆31Updated 8 months ago
lancopku / RAP
Code for the paper "RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models" (EMNLP 2021)
☆24Updated 3 years ago
ffhibnese / CGNC_Targeted_Adversarial_Attacks
[ECCV-2024] Transferable Targeted Adversarial Attack, CLIP models, Generative adversarial network, Multi-target attacks
☆31Updated 8 months ago
Sadcardation / MLLM-Refusal
Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Models
☆13Updated 5 months ago
zjiehang / RanMASK
For Certified Robustness to Text Adversarial Attacks by Randomized [MASK]
☆15Updated 5 months ago
AISG-Technology-Team / GCSS-Track-2A-Submission-Guide
Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 2A).
☆11Updated 2 months ago
chenweixin107 / TrojDiff
☆59Updated 2 years ago
AISafety-HKUST / Backdoor_Safety_Tuning
Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)
☆25Updated 4 months ago
ys-zong / VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆62Updated 2 months ago
UCSC-VLAA / vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆78Updated last year
OPTML-Group / AdvUnlearn
Official implementation of NeurIPS'24 paper "Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Model…
☆39Updated 4 months ago
BrachioLab / adversarial_prompting
☆53Updated last year
ShawnXYang / C-GSP
☆13Updated 2 years ago
cvlab-columbia / ZSRobust4FoundationModel
☆41Updated last year
TreeLLi / APT
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
☆47Updated 3 months ago
ShannonAI / backdoor_nlg
☆15Updated 3 years ago
UCSC-VLAA / AttnGCG-attack
☆14Updated 5 months ago