ChengshuaiZhao0 / The-Wolf-Within
β10Updated 3 weeks ago
Related projects β
Alternatives and complementary repositories for The-Wolf-Within
- β24Updated 3 months ago
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β24Updated 5 months ago
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β21Updated 3 weeks ago
- β17Updated last week
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Modelsβ43Updated 7 months ago
- Official code for "TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization", CVPR 2023β13Updated last year
- β30Updated 4 months ago
- [ICLR 2023] Official repository of the paper "Rethinking the Effect of Data Augmentation in Adversarial Contrastive Learning"β17Updated last year
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"β67Updated 11 months ago
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"β28Updated last month
- The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsβ¦β57Updated last week
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.β45Updated 2 months ago
- [ICLR 2024 Oral] Less is More: Fewer Interpretable Region via Submodular Subset Selectionβ72Updated last month
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794β¦β11Updated 3 months ago
- Official PyTorch implementation of "CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning" @ ICCV 2023β28Updated 10 months ago
- β56Updated last month
- Official implementation of NeurIPS'24 paper "Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Modelβ¦β25Updated last week
- β13Updated 4 months ago
- [ACL 2024] Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models. Detect and mitigate object hallucinatioβ¦β16Updated 4 months ago
- The official repository for paper "MLLM-Protector: Ensuring MLLMβs Safety without Hurting Performance"β31Updated 6 months ago
- β15Updated this week
- One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Modelsβ37Updated 5 months ago
- β38Updated last year
- [Arxiv 2024] Adversarial attacks on multimodal agentsβ38Updated 4 months ago
- [NeurIPS-2023] Annual Conference on Neural Information Processing Systemsβ161Updated last year
- β12Updated 3 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shiβ¦β42Updated 4 months ago
- β21Updated 5 months ago
- β12Updated 6 months ago
- β37Updated 3 months ago