tmllab/2025_ICLR_PiF

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tmllab/2025_ICLR_PiF)

tmllab / 2025_ICLR_PiF

☆40

Alternatives and similar repositories for 2025_ICLR_PiF

Users that are interested in 2025_ICLR_PiF are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tmllab / 2023_NeurIPS_AAER
View on GitHub
☆12Feb 15, 2025Updated last year
YiyiyiZhao / siren
View on GitHub
Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …
☆15Jun 14, 2026Updated last month
thu-coai / TransferAttack
View on GitHub
[ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
☆19May 23, 2025Updated last year
Jinxiaolong1129 / Foot-in-the-door-Jailbreak
View on GitHub
☆23May 14, 2025Updated last year
ZhenchenWan / MF-VITON-High-Fidelity-Mask-Free-Virtual-Try-On-with-Minimal-Input
View on GitHub
☆14Mar 12, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
salman-lui / x-teaming
View on GitHub
☆68May 21, 2025Updated last year
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
View on GitHub
☆61Jun 5, 2024Updated 2 years ago
zhipeng-wei / EmojiAttack
View on GitHub
Emoji Attack [ICML 2025]
☆45Jul 15, 2025Updated last year
UCSC-VLAA / AttnGCG-attack
View on GitHub
[TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
☆27Jun 17, 2025Updated last year
AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
View on GitHub
☆139Dec 3, 2025Updated 7 months ago
tmllab / 2021_ICCV_Me-Momentum
View on GitHub
☆14Jan 7, 2023Updated 3 years ago
tml-epfl / llm-adaptive-attacks
View on GitHub
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆391Jan 23, 2025Updated last year
HanjiangHu / NBF-LLM
View on GitHub
The official code for "Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks".
☆18Jun 24, 2026Updated last month
erfanshayegani / Jailbreak-In-Pieces
View on GitHub
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…
☆93Jun 6, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AI45Lab / ActorAttack
View on GitHub
☆135Jun 29, 2026Updated 3 weeks ago
yueliu1999 / FlipAttack
View on GitHub
[ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".
☆178May 2, 2025Updated last year
Bowen1911 / xJailbreak
View on GitHub
Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"
☆17Apr 3, 2026Updated 3 months ago
a5507203 / Instance-dependent-Label-noise-Learning-under-a-Structural-Causal-Model
View on GitHub
☆17Nov 27, 2023Updated 2 years ago
tmllab / 2025_ICLR_FLOW
View on GitHub
☆64Nov 30, 2025Updated 7 months ago
NY1024 / RACE
View on GitHub
☆27Mar 17, 2025Updated last year
xirui-li / DrAttack
View on GitHub
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆68Aug 25, 2024Updated last year
ShenzheZhu / JailDAM
View on GitHub
[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
☆26Nov 25, 2025Updated 7 months ago
dsbowen / strong_reject
View on GitHub
☆146Jul 7, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xxiqiao / TROJail
View on GitHub
Official implementation of "TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards"
☆31Updated this week
weiyezhimeng / SQL-Injection-Jailbreak
View on GitHub
☆22Jul 26, 2025Updated 11 months ago
JailbreakBench / jailbreakbench
View on GitHub
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
☆634Apr 4, 2025Updated last year
RUCAIBox / HADES
View on GitHub
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆39Oct 23, 2024Updated last year
tmllab / 2022_NeurIPS_PICMM
View on GitHub
☆21Jan 28, 2023Updated 3 years ago
TeamPigeonLab / CS-DJ
View on GitHub
Accept by CVPR 2025 (highlight)
☆25Jun 8, 2025Updated last year
huizhang-L / CodeChameleon
View on GitHub
☆30Mar 20, 2024Updated 2 years ago
liuxuannan / Awesome-Multimodal-Jailbreak
View on GitHub
A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Models
☆332Jan 11, 2026Updated 6 months ago
YitingQu / unsafe-diffusion
View on GitHub
☆50Jul 14, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
TrustMLRG / GASP
View on GitHub
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
☆16Nov 12, 2025Updated 8 months ago
SaFo-Lab / JailBreakV_28K
View on GitHub
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆96May 9, 2025Updated last year
kriti-hippo / red_queen
View on GitHub
Red Queen Dataset and data generation template
☆27Dec 26, 2025Updated 6 months ago
PKU-ML / PAT
View on GitHub
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆22May 6, 2025Updated last year
tmllab / 2021_NeurIPS_PES
View on GitHub
☆30Jan 7, 2023Updated 3 years ago
CosmosYi / ReasoningShield
View on GitHub
ReasoningShield: Safety Detection over Reasoning Traces of Large Reasoning Models
☆26Sep 27, 2025Updated 9 months ago
ys-zong / VLGuard
View on GitHub
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆90Jan 19, 2025Updated last year