Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".
☆71Feb 19, 2023Updated 3 years ago
Alternatives and similar repositories for Advbench
Users that are interested in Advbench are comparing it to the libraries listed below
Sorting:
- ☆44Mar 3, 2023Updated 3 years ago
- Code base for the EMNLP 2021 paper, "Multi-granularity Textual Adversarial Attack with Behavior Cloning".☆13Apr 18, 2022Updated 3 years ago
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…☆88May 9, 2025Updated 9 months ago
- ☆32Aug 9, 2024Updated last year
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- [NAACL 2022] "SemAttack: Natural Textual Attacks via Different Semantic Spaces" by Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, Bo Li☆21Jun 11, 2022Updated 3 years ago
- Mostly recording papers about models' trustworthy applications. Intending to include topics like model evaluation & analysis, security, c…☆21May 30, 2023Updated 2 years ago
- Official repo for the paper "Make Some Noise: Reliable and Efficient Single-Step Adversarial Training" (https://arxiv.org/abs/2202.01181)☆25Oct 17, 2022Updated 3 years ago
- An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)☆113Jan 21, 2025Updated last year
- Official code for the paper "Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?" (ICLR 2024)☆10Aug 26, 2024Updated last year
- ☆10Oct 28, 2020Updated 5 years ago
- ☆27Oct 6, 2024Updated last year
- ☆26Jun 5, 2024Updated last year
- [NDSS 2026] Official repo for Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography☆29Jan 2, 2026Updated 2 months ago
- ☆58Aug 11, 2024Updated last year
- ☆14May 7, 2024Updated last year
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆14Feb 20, 2024Updated 2 years ago
- Code for ICCV2025 paper——IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves☆17Jul 11, 2025Updated 7 months ago
- Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆29Oct 1, 2024Updated last year
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆34Sep 16, 2023Updated 2 years ago
- ☆13Nov 7, 2023Updated 2 years ago
- DILMA: Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers☆12Oct 7, 2020Updated 5 years ago
- Code Repository for "A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models".☆15Oct 14, 2022Updated 3 years ago
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Sensitive-rs is a Rust library for finding, validating, filtering, and replacing sensitive words. It provides efficient algorithms to han…☆22Feb 4, 2026Updated last month
- Code repository for CVPR2024 paper 《Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness》☆25May 29, 2024Updated last year
- Repository for the Bias Benchmark for QA dataset.☆138Jan 8, 2024Updated 2 years ago
- Generated geosite.dat based on Antifilter Community List☆25Updated this week
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆128Feb 24, 2025Updated last year
- The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"☆80Nov 9, 2024Updated last year
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- Interview-based evaluation of LLMs☆25Jan 8, 2025Updated last year
- Dynamic Adversarial Benchmarking platform☆26Jun 22, 2022Updated 3 years ago
- [ICML 2025] X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP☆39Feb 3, 2026Updated last month
- 🌏 UI component library for the future, based on WebComponent.☆23Nov 12, 2024Updated last year
- This repo contains code for the paper: "Can Foundation Models Help Us Achieve Perfect Secrecy?"☆24Feb 9, 2023Updated 3 years ago
- ☆21May 13, 2022Updated 3 years ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆191Jun 26, 2025Updated 8 months ago
- ☆164Sep 2, 2024Updated last year