chen37058/Red-Team-Arxiv-Paper-Update

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/chen37058/Red-Team-Arxiv-Paper-Update)

chen37058 / Red-Team-Arxiv-Paper-Update

Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)

☆118

Alternatives and similar repositories for Red-Team-Arxiv-Paper-Update

Users that are interested in Red-Team-Arxiv-Paper-Update are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NY1024 / Jailbreak_GPT4o
View on GitHub
☆28Jun 5, 2024Updated 2 years ago
liuxuannan / Awesome-Multimodal-Jailbreak
View on GitHub
A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Models
☆333Jan 11, 2026Updated 6 months ago
isXinLiu / Awesome-MLLM-Safety
View on GitHub
Accepted by IJCAI-24 Survey Track
☆233Aug 25, 2024Updated last year
Dtc7w3PQ / Response-Attack
View on GitHub
Official implementation of “Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models” (AAAI 2026).
☆37Mar 22, 2026Updated 4 months ago
CryptoAILab / FigStep
View on GitHub
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆211Jun 26, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
YiyiyiZhao / siren
View on GitHub
Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …
☆15Jun 14, 2026Updated last month
Rookie143 / BadRobot
View on GitHub
This is the official repository for the ICLR 2025 accepted paper Badrobot: Manipulating Embodied LLMs in the Physical World.
☆46Jun 11, 2026Updated last month
Trustworthy-AI-Group / Adversarial_Examples_Papers
View on GitHub
A list of recent papers about adversarial learning
☆372Updated this week
GuanlinLee / ART
View on GitHub
Official Code for ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users (NeurIPS 2024)
☆25Oct 23, 2024Updated last year
TrustAI-laboratory / Many-Shot-Jailbreaking-Demo
View on GitHub
Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…
☆17Aug 6, 2024Updated last year
wbopan / safety-residual-space
View on GitHub
Multi-dimensional analysis of orthogonal safety directions in LLM alignment
☆23Jun 12, 2026Updated last month
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
View on GitHub
☆61Jun 5, 2024Updated 2 years ago
SaFo-Lab / seclaw
View on GitHub
🦾 SeClaw: The Security Armored Personal AI Assistant
☆31Mar 18, 2026Updated 4 months ago
Privatris / AgentLeak
View on GitHub
AgentLeak: Open benchmark for privacy leakage in LLM agents — 7 channels, multi-agent, multi-framework.
☆25Jul 1, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
franciscoliu / SKU
View on GitHub
Official code implementation of SKU, Accepted by ACL 2024 Findings
☆20Dec 18, 2024Updated last year
ZhentingWang / DIAGNOSIS
View on GitHub
☆23Apr 23, 2024Updated 2 years ago
tmllab / 2025_ICLR_PiF
View on GitHub
☆40May 17, 2025Updated last year
yueliu1999 / Awesome-Jailbreak-on-LLMs
View on GitHub
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, data…
☆1,547Jul 22, 2026Updated last week
leigest519 / HiddenDetect
View on GitHub
ACL 2025 (Main) HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States
☆165Jun 8, 2025Updated last year
SheltonLiu-N / Universal-Prompt-Injection
View on GitHub
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆73Oct 23, 2024Updated last year
RUCAIBox / HADES
View on GitHub
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆39Oct 23, 2024Updated last year
SaFo-Lab / Awesome-T2I-safety-Papers
View on GitHub
List of T2I safety papers, updated daily, welcome to discuss using Discussions
☆68Aug 12, 2024Updated last year
Sadcardation / ImageProtector
View on GitHub
Repository for the Paper: Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Inj…
☆19Apr 17, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
UCSC-VLAA / AttnGCG-attack
View on GitHub
[TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
☆27Jun 17, 2025Updated last year
s-ball-10 / jailbreak_dynamics
View on GitHub
☆25Jun 13, 2024Updated 2 years ago
leileqiTHU / Attacker
View on GitHub
The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1
☆13Apr 23, 2025Updated last year
qizhangli / MoreBayesian-attack
View on GitHub
Code for our ICLR 2023 paper Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples.
☆18May 31, 2023Updated 3 years ago
OSU-NLP-Group / AmpleGCG
View on GitHub
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆87Nov 3, 2024Updated last year
Lyz1213 / Backdoored_PPLM
View on GitHub
☆15Dec 12, 2023Updated 2 years ago
chiayi-hsu / Ring-A-Bell
View on GitHub
☆46Jan 15, 2025Updated last year
Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models
View on GitHub
Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
☆282May 13, 2024Updated 2 years ago
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
View on GitHub
☆80Mar 30, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Vinsonzyh / BlueSuffix
View on GitHub
[ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
☆31Nov 2, 2025Updated 8 months ago
EVIGBYEN / Mousetrap
View on GitHub
☆18Jul 3, 2025Updated last year
Yunhao-Feng / AgentHazard
View on GitHub
☆29Jun 13, 2026Updated last month
Jinxiaolong1129 / Foot-in-the-door-Jailbreak
View on GitHub
☆23May 14, 2025Updated last year
Hanpx20 / SafeSwitch
View on GitHub
Official code repository for the paper "Internal Activation as the Polar Star for Steering Unsafe LLM Behavior"
☆15May 31, 2026Updated last month
Zsbyqx20 / AgentHazard
View on GitHub
Mobile GUI Agents under Real-world Threats: Are We There Yet?
☆17May 18, 2026Updated 2 months ago
Jacobhhy / Agent-Memory-Poisoning
View on GitHub
☆21Dec 11, 2025Updated 7 months ago