jiaxiaojunQAQ / I-GCG
Improved techniques for optimization-based jailbreaking on large language models
☆61Updated 7 months ago
Alternatives and similar repositories for I-GCG:
Users that are interested in I-GCG are comparing it to the libraries listed below
- [arXiv 2024] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".☆86Updated 2 months ago
- [ICML22] "Revisiting and Advancing Fast Adversarial Training through the Lens of Bi-level Optimization" by Yihua Zhang*, Guanhua Zhang*, …☆62Updated 2 years ago
- Improving fast adversarial training with prior-guided knowledge (TPAMI2024)☆32Updated 8 months ago
- [CCS'24] SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models☆84Updated 3 months ago
- Revisiting and Exploring Efficient Fast Adversarial Training via LAW: Lipschitz Regularization and Auto Weight Averaging (TIFS2024)☆32Updated 7 months ago
- Code for Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack☆20Updated 2 months ago
- Practical Detection of Trojan Neural Networks☆117Updated 4 years ago
- [CVPR2024] MMA-Diffusion: MultiModal Attack on Diffusion Models☆137Updated 9 months ago
- YiJian-Comunity: a full-process automated large model safety evaluation tool designed for academic research☆102Updated 3 months ago
- A comprehensive collection of resources focused on addressing and understanding hallucination phenomena in MLLMs.☆35Updated 8 months ago
- Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet, CVPR2021 安全AI挑战者计划第六期:…☆48Updated 3 years ago
- ☆62Updated last month
- ☆55Updated last year
- [ICLR 2023] Official Tensorflow implementation of "Distributionally Robust Post-hoc Classifiers under Prior Shifts"☆32Updated 11 months ago
- [NeurIPS22] "Advancing Model Pruning via Bi-level Optimization" by Yihua Zhang*, Yuguang Yao*, Parikshit Ram, Pu Zhao, Tianlong Chen, Min…☆112Updated last year
- [USENIX Security '24] Dataset associated with real-world malicious LLM applications, including 45 malicious prompts for generating malici…☆53Updated 3 months ago
- Code for Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks (TIFS2024)☆12Updated 9 months ago
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆18Updated 5 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆53Updated last week
- Code for paper: From redundancy to relevance: Enhancing explainability in multimodal large language models☆63Updated this week
- [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion☆32Updated 2 months ago
- CVPR 2022 Workshop Robust Classification☆78Updated 2 years ago
- SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation☆53Updated last month
- This repo contains my customised style python based plots for NLP papers, and includes my reproduction for my favourite papers' plots☆36Updated 10 months ago
- Benchmarking LLMs via Uncertainty Quantification☆201Updated 11 months ago
- RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response☆37Updated 3 weeks ago
- [ACL 2024] CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and …☆97Updated 5 months ago
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆16Updated 8 months ago
- ☆27Updated 6 months ago
- Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 1A).☆16Updated 6 months ago