Bowen1911/xJailbreak

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Bowen1911/xJailbreak)

Bowen1911 / xJailbreak

Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"

☆17

Alternatives and similar repositories for xJailbreak

Users that are interested in xJailbreak are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NY1024 / Jailbreak_GPT4o
View on GitHub
☆28Jun 5, 2024Updated 2 years ago
TeamPigeonLab / CS-DJ
View on GitHub
Accept by CVPR 2025 (highlight)
☆25Jun 8, 2025Updated last year
aaFrostnova / Papillon
View on GitHub
[Usenix Security 2025] Official repo of paper PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
☆69Nov 17, 2025Updated 8 months ago
grasses / PoisonPrompt
View on GitHub
Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107
☆21Aug 10, 2024Updated last year
thu-coai / TransferAttack
View on GitHub
[ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
☆19May 23, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Lucas-TY / llm_Implicit_reference
View on GitHub
Official Implementation of implicit reference attack
☆11Oct 16, 2024Updated last year
zhaoshiji123 / SI-Attack
View on GitHub
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
☆16Aug 6, 2025Updated 11 months ago
tmllab / 2025_ICLR_PiF
View on GitHub
☆40May 17, 2025Updated last year
neuron-insight-lab / LoopLLM
View on GitHub
[AAAI 2026] The official code for ``LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation''
☆17Mar 20, 2026Updated 4 months ago
theshi-1128 / jailbreak-bench
View on GitHub
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆21Mar 22, 2025Updated last year
PKU-ML / PAT
View on GitHub
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆22May 6, 2025Updated last year
YancyKahn / CoA
View on GitHub
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆39Jan 17, 2025Updated last year
papersPapers / BadPrompt
View on GitHub
Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"
☆41Jul 8, 2024Updated 2 years ago
aisa-group / promptinject-agent-skills
View on GitHub
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
☆21Jul 2, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
TrustMLRG / GASP
View on GitHub
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
☆16Nov 12, 2025Updated 8 months ago
pwnhyo / T-MAP
View on GitHub
☆18Mar 25, 2026Updated 4 months ago
DSN-2024 / DSN
View on GitHub
DSN jailbreak Attack & Evaluation Ensemble
☆17Feb 7, 2026Updated 5 months ago
ASTRAL-Group / ASTRA
View on GitHub
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…
☆62Jul 5, 2025Updated last year
ShenzheZhu / JailDAM
View on GitHub
[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
☆26Nov 25, 2025Updated 8 months ago
facebookresearch / multimodal-fusion-jailbreaks
View on GitHub
Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)
☆20Oct 22, 2024Updated last year
bigglesworthnotacat / LLM-Steg
View on GitHub
[ICLR 2026 Oral] Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
☆20Mar 22, 2026Updated 4 months ago
ethz-spylab / rlhf-poisoning
View on GitHub
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
☆67Apr 24, 2024Updated 2 years ago
Sbwillbealier / qa-rag-demo
View on GitHub
Implementing RAG Knowledge Base with Langchain
☆14Nov 7, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wangyu-ovo / MML
View on GitHub
Code for the paper "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage"
☆35Dec 6, 2024Updated last year
abc03570128 / Jailbreaking-Attack-against-Multimodal-Large-Language-Model
View on GitHub
☆63Aug 11, 2024Updated last year
Zhow01 / SkillAttack
View on GitHub
☆52May 19, 2026Updated 2 months ago
Claw-Guard / ClawGuard
View on GitHub
☆24May 12, 2026Updated 2 months ago
Django-Jiang / BadChain
View on GitHub
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆56Jul 24, 2024Updated 2 years ago
ole-knf / A-bidirectional-GPT-approach-for-detecting-malicious-network-traffic
View on GitHub
This approach of Intrusion Detection uses two GPT models, which are trained on normal network traffic, to predict sequences of communicat…
☆11Oct 3, 2023Updated 2 years ago
theshi-1128 / ReDPJ
View on GitHub
A novel jailbreak attack unveiling an overlooked attack surface inherently in the chain-of-thought reasoning trajectory of LLMs
☆22Apr 3, 2026Updated 3 months ago
BIIIANG / HUST-CSE-SecurityOfComputerNetworkExperiments-2022
View on GitHub
华中科技大学-网络空间安全学院-计算机网络安全实验-2022春
☆10Aug 28, 2022Updated 3 years ago
matthewwicker / Kryptonite-N
View on GitHub
Coursework for Mathematics for Machine Learning (70015) at Imperial College London
☆10Nov 12, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xxiqiao / TROJail
View on GitHub
Official implementation of "TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards"
☆31Jul 20, 2026Updated last week
UNHSAILLab / working-memory-attack-on-llms
View on GitHub
Working Memory Attack on LLMs
☆18May 27, 2025Updated last year
ytyz1307zzh / IHEval
View on GitHub
Code and data for NAACL 2025 paper "IHEval: Evaluating Language Models on Following the Instruction Hierarchy"
☆18Feb 25, 2025Updated last year
kriti-hippo / red_queen
View on GitHub
Red Queen Dataset and data generation template
☆27Dec 26, 2025Updated 7 months ago
thu-coai / JailbreakDefense_GoalPriority
View on GitHub
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Jul 9, 2024Updated 2 years ago
TanqiuJiang / AgentLAB
View on GitHub
The official implementation of the paper "AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks"
☆28Jun 1, 2026Updated last month
Lee-CBG / ATM-TCR
View on GitHub
☆12Mar 22, 2024Updated 2 years ago