Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models (LLMs). This repository contains the resources for reproducing the experiments described in our work.
☆15Jun 14, 2026Updated this week
Alternatives and similar repositories for siren
Users that are interested in siren are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21May 14, 2025Updated last year
- Official implementation of "TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards"☆29Apr 13, 2026Updated 2 months ago
- Red Queen Dataset and data generation template☆26Dec 26, 2025Updated 5 months ago
- ☆27Mar 17, 2025Updated last year
- Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)☆20Oct 22, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation☆26Jun 17, 2025Updated last year
- ☆136Feb 3, 2025Updated last year
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆21Aug 10, 2024Updated last year
- ☆67May 21, 2025Updated last year
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models☆33Dec 30, 2024Updated last year
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated last year
- Auto1111 port of NVlab's adversarial purification method that uses the forward and reverse processes of diffusion models to remove advers…☆13Aug 8, 2023Updated 2 years ago
- ☆22Jul 26, 2025Updated 10 months ago
- ☆59May 30, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆87Nov 3, 2024Updated last year
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆24Jul 26, 2024Updated last year
- ☆14Oct 7, 2022Updated 3 years ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆388Jan 23, 2025Updated last year
- Generalized Optimal Transport Attention with Trainable Priors☆69Jan 25, 2026Updated 4 months ago
- ☆32Mar 16, 2025Updated last year
- 云音乐用户信息可视化☆14Nov 18, 2022Updated 3 years ago
- Code for Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack(TPAMI 2025)☆42Aug 28, 2025Updated 9 months ago
- ☆18Jun 4, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆136Dec 3, 2025Updated 6 months ago
- ☆16Sep 1, 2025Updated 9 months ago
- ☆15Aug 7, 2025Updated 10 months ago
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆37Oct 15, 2023Updated 2 years ago
- [ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".☆172May 2, 2025Updated last year
- ☆33Jun 24, 2024Updated last year
- ☆10Apr 29, 2020Updated 6 years ago
- Adversarial Attack for Pre-trained Code Models☆10Jul 19, 2022Updated 3 years ago
- Official Implementation of implicit reference attack☆11Oct 16, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆24May 23, 2025Updated last year
- The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.☆15Dec 16, 2024Updated last year
- [EMNLP 2024 Findings] Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information☆13Oct 1, 2024Updated last year
- Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"☆27Jul 6, 2024Updated last year
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆160Sep 2, 2025Updated 9 months ago
- 2024山东大学本科毕设Latex模板☆18Jan 2, 2026Updated 5 months ago
- ☆12Oct 29, 2023Updated 2 years ago