YiyiyiZhao / sirenLinks

Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models (LLMs). This repository contains the resources for reproducing the experiments described in our work.

☆13

Alternatives and similar repositories for siren

Users that are interested in siren are comparing it to the libraries listed below

Sorting:

CryptoAILab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆179Updated 5 months ago
Dtc7w3PQ / Visco-Attack
Official implementation of Visco-Attack (EMNLP 2025 Main). We will progressively release the code and one-click reproduction scripts.
☆26Updated 3 months ago
roywang021 / IDEATOR
Code for ICCV2025 paper——IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
☆15Updated 4 months ago
tmllab / 2025_ICLR_PiF
☆37Updated 6 months ago
kriti-hippo / red_queen
Red Queen Dataset and data generation template
☆20Updated last year
RUCAIBox / HADES
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆33Updated last year
SaFoLab-WISC / JailBreakV_28K
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆81Updated 6 months ago
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
☆66Updated 7 months ago
DSN-2024 / DSN
DSN jailbreak Attack & Evaluation Ensemble
☆14Updated 2 weeks ago
yuplin2333 / representation-space-jailbreak
Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…
☆22Updated last year
zihao-ai / unthinking_vulnerability
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
☆32Updated 6 months ago
ledllm / ledllm
☆23Updated last year
AI45Lab / ActorAttack
☆111Updated 9 months ago
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆177Updated last year
PKU-ML / PAT
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆21Updated 6 months ago
NY1024 / Foundation-Model-Paper-Notes
☆69Updated 6 months ago
neelsjain / baseline-defenses
Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"
☆28Updated 2 years ago
erfanshayegani / Jailbreak-In-Pieces
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…
☆76Updated last year
roywang021 / UMK
Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models
☆30Updated 11 months ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
☆53Updated last year
abc03570128 / Jailbreaking-Attack-against-Multimodal-Large-Language-Model
☆52Updated last year
AI-secure / MMDT
Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
☆24Updated 8 months ago
ASTRAL-Group / ASTRA
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…
☆46Updated 4 months ago
thu-coai / JailbreakDefense_GoalPriority
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Updated last year
TeamPigeonLab / CS-DJ
Accept by CVPR 2025 (highlight)
☆22Updated 5 months ago
listen0425 / Safety-Layers
code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)
☆16Updated 7 months ago
wangyu-ovo / MML
Code for the paper "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage"
☆25Updated 11 months ago
MaTengSYSU / HIMRD-jailbreak
Code repository for the paper "Heuristic Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models"
☆13Updated 3 months ago
salman-lui / x-teaming
☆47Updated 6 months ago
Lyz1213 / BadEdit
☆36Updated last year