LIONS-EPFL / Charmer
Revisiting Character-level Adversarial Attacks for Language Models, ICML 2024
☆9Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for Charmer
- ☆13Updated 2 months ago
- Machine Learning & Security Seminar @Purdue University☆25Updated last year
- ☆20Updated 9 months ago
- ☆18Updated last month
- This repository is the official implementation of the paper "ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning…☆17Updated last year
- Code release for DeepJudge (S&P'22)☆51Updated last year
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆48Updated 3 months ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆56Updated last month
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆26Updated last year
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆45Updated last month
- Distribution Preserving Backdoor Attack in Self-supervised Learning☆12Updated 9 months ago
- This is the source code for MEA-Defender. Our paper is accepted by the IEEE Symposium on Security and Privacy (S&P) 2024.☆15Updated last year
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆15Updated 7 months ago
- Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"☆36Updated 4 months ago
- ☆23Updated 3 years ago
- ☆18Updated 8 months ago
- ☆12Updated 6 months ago
- A curated list of trustworthy Generative AI papers. Daily updating...☆67Updated 2 months ago
- ☆23Updated 2 years ago
- BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models☆74Updated 2 months ago
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models☆94Updated last month
- Code for paper "SrcMarker: Dual-Channel Source Code Watermarking via Scalable Code Transformations" (IEEE S&P 2024)☆21Updated 3 months ago
- Hidden backdoor attack on NLP systems☆46Updated 3 years ago
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆46Updated 3 weeks ago
- ☆9Updated 3 years ago
- ☆62Updated 4 years ago
- ☆13Updated 2 years ago
- ☆40Updated last year
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆12Updated 3 months ago
- ☆23Updated last year