Jinxiaolong1129/Foot-in-the-door-Jailbreak

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Jinxiaolong1129/Foot-in-the-door-Jailbreak)

Jinxiaolong1129 / Foot-in-the-door-Jailbreak

☆20

Alternatives and similar repositories for Foot-in-the-door-Jailbreak

Users that are interested in Foot-in-the-door-Jailbreak are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AI45Lab / X-Boundary
View on GitHub
[EMNLP 2025] The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Com…
☆40Nov 24, 2025Updated 5 months ago
AI45Lab / ActorAttack
View on GitHub
☆133Feb 3, 2025Updated last year
weiyezhimeng / SQL-Injection-Jailbreak
View on GitHub
☆22Jul 26, 2025Updated 9 months ago
YiyiyiZhao / siren
View on GitHub
Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …
☆15Sep 12, 2025Updated 8 months ago
leileqiTHU / Attacker
View on GitHub
The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1
☆13Apr 23, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
listen0425 / Safety-Layers
View on GitHub
code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)
☆24Apr 26, 2025Updated last year
xirui-li / DrAttack
View on GitHub
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆66Aug 25, 2024Updated last year
vfleaking / PTST
View on GitHub
Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"
☆22Sep 21, 2025Updated 7 months ago
AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
View on GitHub
☆134Dec 3, 2025Updated 5 months ago
thu-coai / AISafetyLab
View on GitHub
AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.
☆241Apr 21, 2026Updated 3 weeks ago
yjw1029 / EmbMarker
View on GitHub
Code and data for our paper "Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark"…
☆52Jul 11, 2023Updated 2 years ago
TrustGen / TrustEval-toolkit
View on GitHub
[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.
☆130Aug 22, 2025Updated 8 months ago
lapisrocks / rpo
View on GitHub
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆62Aug 8, 2024Updated last year
junchaoIU / DetectRL
View on GitHub
[NeurIPS 2024 D&B] DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios
☆15Nov 19, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yiksiu-chan / SpeakEasy
View on GitHub
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
☆14Mar 7, 2026Updated 2 months ago
IBM / Adversarial-Prompt-Evaluation
View on GitHub
Code Implementation of Adversarial Prompt Evaluation paper
☆14Sep 18, 2025Updated 8 months ago
git-disl / Booster
View on GitHub
This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…
☆36Mar 22, 2025Updated last year
AI45Lab / CodeAttack
View on GitHub
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆59Oct 1, 2025Updated 7 months ago
clearloveclearlove / BEAT
View on GitHub
☆14Feb 26, 2025Updated last year
yueliu1999 / FlipAttack
View on GitHub
[ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".
☆172May 2, 2025Updated last year
Meirtz / BabyBLUE-llm
View on GitHub
[COLING 2025] Official repo of paper: "Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jail…
☆12Jul 26, 2024Updated last year
Lucas-TY / llm_Implicit_reference
View on GitHub
Official Implementation of implicit reference attack
☆11Oct 16, 2024Updated last year
PurCL / ASTRA
View on GitHub
🥇 Amazon Nova AI Challenge Winner - ASTRA emerged victorious as the top attacking team in Amazon's global AI safety competition, defeati…
☆70May 11, 2026Updated last week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
w-yibo / R1-Compress
View on GitHub
[NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
☆17Jan 24, 2026Updated 3 months ago
SproutNan / AI-Safety_Benchmark
View on GitHub
The official repository for guided jailbreak benchmark
☆29Jul 28, 2025Updated 9 months ago
jiah-li / magic
View on GitHub
The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.
☆14Dec 16, 2024Updated last year
rmin2000 / adv_tracing
View on GitHub
Identification of the Adversary from a Single Adversarial Example (ICML 2023)
☆10Jul 15, 2024Updated last year
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated 10 months ago
zjunlp / LookAheadTuning
View on GitHub
[WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews
☆17Dec 14, 2025Updated 5 months ago
BRZ911 / Wrong-of-Thought
View on GitHub
[EMNLP 2024 Findings] Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information
☆13Oct 1, 2024Updated last year
tanganke / subspace_fusion
View on GitHub
Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"
☆14Mar 28, 2024Updated 2 years ago
SCIR-SC-Qiaoban-Team / FreeEvalLM
View on GitHub
[AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…
☆10Feb 7, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
IBM / NeuralFuse
View on GitHub
[NeurIPS'24] "NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes" by Hao-Lun …
☆10Sep 18, 2025Updated 8 months ago
HuangZhiChao95 / ATAS
View on GitHub
☆12Oct 29, 2023Updated 2 years ago
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
XuandongZhao / weak-to-strong
View on GitHub
[ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models
☆90May 2, 2025Updated last year
diaoquesang / Code-in-Paper-Guide
View on GitHub
🌟 手把手教你在论文中插入代码链接
☆25Aug 2, 2025Updated 9 months ago
lapisrocks / DiscreteAdversarialDistillation
View on GitHub
[NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"
☆11Jun 18, 2024Updated last year
thestephencasper / feature_level_adv
View on GitHub
Demo code for the paper: One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features
☆12Nov 30, 2023Updated 2 years ago