xxiqiao/TROJail

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xxiqiao/TROJail)

xxiqiao / TROJail

Official implementation of "TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards"

☆31

Alternatives and similar repositories for TROJail

Users that are interested in TROJail are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

YiyiyiZhao / siren
View on GitHub
Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …
☆15Jun 14, 2026Updated last month
salman-lui / x-teaming
View on GitHub
☆67May 21, 2025Updated last year
NY1024 / RACE
View on GitHub
☆27Mar 17, 2025Updated last year
HanjiangHu / NBF-LLM
View on GitHub
The official code for "Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks".
☆18Jun 24, 2026Updated last month
pwnhyo / T-MAP
View on GitHub
☆17Mar 25, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
albert-y1n / PISmith
View on GitHub
PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses
☆22Jul 17, 2026Updated last week
kriti-hippo / red_queen
View on GitHub
Red Queen Dataset and data generation template
☆27Dec 26, 2025Updated 6 months ago
shiningrain / JailGuard
View on GitHub
☆32Mar 16, 2025Updated last year
yuki-younai / MTSA
View on GitHub
offical implementation of MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
☆16Jun 2, 2025Updated last year
WYuan1001 / AdaVD
View on GitHub
[CVPR2025] Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
☆44Mar 11, 2025Updated last year
CHATS-lab / ToolShield
View on GitHub
[ICML 2026] Official implementation for paper "Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Ag…
☆29Jul 6, 2026Updated 2 weeks ago
tu-tuing / SlowBA
View on GitHub
[🏆ECCV'26] Official Repo for SlowBA: An efficiency backdoor attack towards VLM-based GUI agents
☆15Jul 1, 2026Updated 3 weeks ago
tmllab / 2025_ICLR_PiF
View on GitHub
☆40May 17, 2025Updated last year
TrustAIRLab / HarmfulSkillBench
View on GitHub
The Official Repository for Paper "HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?"
☆15May 2, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
UCSC-VLAA / AttnGCG-attack
View on GitHub
[TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
☆27Jun 17, 2025Updated last year
TanqiuJiang / AgentLAB
View on GitHub
The official implementation of the paper "AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks"
☆27Jun 1, 2026Updated last month
Jinxiaolong1129 / Foot-in-the-door-Jailbreak
View on GitHub
☆23May 14, 2025Updated last year
RPC2 / AutoInject
View on GitHub
☆20Jun 12, 2026Updated last month
xunguangwang / SoK4JailbreakGuardrails
View on GitHub
[S&P 2026] SoK: Evaluating Jailbreak Guardrails for Large Language Models
☆44Dec 17, 2025Updated 7 months ago
Mr-Peach0301 / Flower
View on GitHub
☆14Mar 11, 2025Updated last year
cxfann / Flame
View on GitHub
☆15May 19, 2026Updated 2 months ago
jiaxiaojunQAQ / SkillJect
View on GitHub
SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement
☆73Jun 11, 2026Updated last month
xiZAIzai / JailExpert
View on GitHub
This is the official repository for JailExpert
☆23Sep 9, 2025Updated 10 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
AI45Lab / MAGIC
View on GitHub
Code for paper "MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM safety"
☆51May 11, 2026Updated 2 months ago
agiresearch / iAgent
View on GitHub
[ACL 2025] iAgent: LLM Agent as a Shield between User and Recommender Systems
☆34May 23, 2025Updated last year
aisa-group / promptinject-agent-skills
View on GitHub
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
☆21Jul 2, 2026Updated 3 weeks ago
AI45Lab / ActorAttack
View on GitHub
☆134Jun 29, 2026Updated 3 weeks ago
Bowen1911 / xJailbreak
View on GitHub
Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"
☆17Apr 3, 2026Updated 3 months ago
NExTplusplus / L2I
View on GitHub
The baseline method for CCIR 22 https://www.datafountain.cn/competitions/573
☆13Aug 2, 2022Updated 3 years ago
Alkhatibnatasha / SOMEIP_IDS
View on GitHub
RNN-based IDS for SOME/IP Intrusion Detection
☆10Jul 20, 2021Updated 5 years ago
DSN-2024 / DSN
View on GitHub
DSN jailbreak Attack & Evaluation Ensemble
☆17Feb 7, 2026Updated 5 months ago
necst / CANflict
View on GitHub
☆16Sep 20, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ZhihongShao / RECTIFY
View on GitHub
Code and models for ``Answering Open-Domain Multi-Answer Questions via a Recall-then-Verify Framework (ACL 2022)''
☆12Jun 29, 2022Updated 4 years ago
bigglesworthnotacat / LLM-Steg
View on GitHub
[ICLR 2026 Oral] Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
☆20Mar 22, 2026Updated 4 months ago
maowenyu-11 / RPP
View on GitHub
☆22Jan 26, 2024Updated 2 years ago
thu-coai / TransferAttack
View on GitHub
[ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
☆19May 23, 2025Updated last year
jianshuod / SafeSearch
View on GitHub
[ICML 2026] Official implementations of ``SafeSearch: Automated Red-Teaming of LLM-Based Search Agents''
☆19Mar 25, 2026Updated 4 months ago
Zhow01 / SkillAttack
View on GitHub
☆52May 19, 2026Updated 2 months ago
protectskills / MaliciousAgentSkillsBench
View on GitHub
A Security Benchmark for Claude Code Agent Skills
☆69Updated this week