gitkolento / Adversarial-Attacks-on-LLMs
针对大语言模型的对抗性攻击总结
☆16Updated last year
Alternatives and similar repositories for Adversarial-Attacks-on-LLMs:
Users that are interested in Adversarial-Attacks-on-LLMs are comparing it to the libraries listed below
- ☆77Updated 9 months ago
- This Github repository summarizes a list of research papers on AI security from the four top academic conferences.☆105Updated last year
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆62Updated 3 months ago
- Simple PyTorch implementations of Badnets on MNIST and CIFAR10.☆167Updated 2 years ago
- Official code for our NDSS paper "Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarkin…☆25Updated 2 months ago
- ☆33Updated last month
- ☆23Updated 4 months ago
- ☆26Updated last month
- ☆16Updated 3 months ago
- AI Model Security Reading Notes☆35Updated 6 months ago
- Red Queen Dataset and data generation template☆10Updated 3 months ago
- A curated list of papers & resources on backdoor attacks and defenses in deep learning.☆192Updated 10 months ago
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆13Updated 2 months ago
- Repository for Towards Codable Watermarking for Large Language Models☆34Updated last year
- This is the source code for Data-free Backdoor. Our paper is accepted by the 32nd USENIX Security Symposium (USENIX Security 2023).☆31Updated last year
- Invisible Backdoor Attack with Sample-Specific Triggers☆93Updated 2 years ago
- ☆15Updated 2 weeks ago
- MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks…☆18Updated 4 months ago
- Source code and scripts for the paper "Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks"☆15Updated last month
- A curated list of papers & resources linked to data poisoning, backdoor attacks and defenses against them (no longer maintained)☆219Updated 2 weeks ago
- The automated prompt injection framework for LLM-integrated applications.☆179Updated 4 months ago
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆193Updated 3 weeks ago
- Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024)☆34Updated 10 months ago
- AdvDoor: Adversarial Backdoor Attack of Deep Learning System☆32Updated 2 months ago
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models☆111Updated 3 months ago
- ☆49Updated last month
- ☆12Updated last year
- ☆24Updated 3 years ago
- 复现了下Neural Cleanse这篇论文,真的是简单而有效,发在了okaland☆30Updated 3 years ago