Official implementation of "TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards"
☆29Apr 13, 2026Updated 2 months ago
Alternatives and similar repositories for TROJail
Users that are interested in TROJail are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21May 14, 2025Updated last year
- Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …☆15Sep 12, 2025Updated 9 months ago
- [CVPR2025] Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters☆43Mar 11, 2025Updated last year
- ☆13Mar 11, 2025Updated last year
- [TPAMI2025] BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors☆16Apr 23, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆22Jan 26, 2024Updated 2 years ago
- [AAAI 2026] This is the official implementation of the paper "ExtendAttack: Attacking Servers of LRMs via Extending Reasoning".☆23Mar 18, 2026Updated 2 months ago
- [ICML 2025] Official code of "DAMA: Data- and Model-aware Alignment of Multi-modal LLMs"☆16May 24, 2025Updated last year
- Personalized Image Generation with Large Multimodal Models☆17May 13, 2025Updated last year
- ☆60Apr 9, 2026Updated 2 months ago
- We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench show…☆65Feb 4, 2026Updated 4 months ago
- ☆13Feb 25, 2025Updated last year
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- USTC 算法基础 2022SP 实验☆12Jun 10, 2022Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆26Jan 5, 2026Updated 5 months ago
- Official Implementation of implicit reference attack☆11Oct 16, 2024Updated last year
- Implement of Implicit Knowledge Extraction Attack.☆23Apr 17, 2026Updated last month
- The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.☆15Dec 16, 2024Updated last year
- ☆27Oct 27, 2025Updated 7 months ago
- Focused Papers, Delivered Simply :)☆55Dec 25, 2025Updated 5 months ago
- [EMNLP 2024 Findings] Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information☆13Oct 1, 2024Updated last year
- ☆12Oct 29, 2023Updated 2 years ago
- [KDD'25] Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective☆110Feb 28, 2026Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 🌟 手把手教你在论文中插入代码链接☆25Aug 2, 2025Updated 10 months ago
- ☆12Nov 12, 2024Updated last year
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark☆30Apr 4, 2026Updated 2 months ago
- Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)☆20Oct 22, 2024Updated last year
- ☆31Mar 16, 2025Updated last year
- [ICLR'26] SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models☆40Mar 9, 2026Updated 3 months ago
- 🌤️微信小程序:天气可视化❄️☆11Dec 28, 2021Updated 4 years ago
- [ICLR'26] Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?☆53Mar 9, 2026Updated 3 months ago
- Universal preflight security scanner for AI coding agents — Detects hooks injection, credential exfiltration & backdoors in .cursorrules,…☆72May 29, 2026Updated 2 weeks ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Data and code for the paper: Finding Safety Neurons in Large Language Models☆29Jan 29, 2026Updated 4 months ago
- A new heuristic to optimize implementations of linear matrices☆20Jan 2, 2023Updated 3 years ago
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆22Jul 2, 2024Updated last year
- An interactive attention visualization and intervention tool for LLM Decode Stage.☆48Jan 6, 2026Updated 5 months ago
- Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks☆78May 7, 2026Updated last month
- PFI: Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents☆28Mar 26, 2025Updated last year
- The official codebase for our paper, FLEX: Continuous Agent Evolution via Forward Learning from Experience.☆81Jun 9, 2026Updated last week