dangne / tmd
[EMNLP'22] Textual Manifold-based Defense Against Natural Language Adversarial Examples
☆11Updated last year
Alternatives and similar repositories for tmd:
Users that are interested in tmd are comparing it to the libraries listed below
- ☆18Updated last year
- Identification of the Adversary from a Single Adversarial Example (ICML 2023)☆9Updated 8 months ago
- ☆54Updated last year
- ☆13Updated 10 months ago
- codes for "Searching for an Effective Defender:Benchmarking Defense against Adversarial Word Substitution"☆31Updated last year
- ☆19Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆10Updated 3 weeks ago
- [CVPR 2024] This repository includes the official implementation our paper "Revisiting Adversarial Training at Scale"☆19Updated 11 months ago
- [ICLR 2022 official code] Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?☆29Updated 3 years ago
- ACL 2021 - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble☆17Updated last year
- ☆11Updated 4 months ago
- Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)☆23Updated 3 years ago
- ☆40Updated 3 months ago
- ☆31Updated 8 months ago
- Code for the paper "RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models" (EMNLP 2021)☆24Updated 3 years ago
- [ECCV-2024] Transferable Targeted Adversarial Attack, CLIP models, Generative adversarial network, Multi-target attacks☆31Updated 8 months ago
- Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Models☆13Updated 5 months ago
- For Certified Robustness to Text Adversarial Attacks by Randomized [MASK]☆15Updated 5 months ago
- Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 2A).☆11Updated 2 months ago
- ☆59Updated 2 years ago
- Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)☆25Updated 4 months ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆62Updated 2 months ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆78Updated last year
- Official implementation of NeurIPS'24 paper "Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Model…☆39Updated 4 months ago
- ☆53Updated last year
- ☆13Updated 2 years ago
- ☆41Updated last year
- One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models☆47Updated 3 months ago
- ☆15Updated 3 years ago
- ☆14Updated 5 months ago