chuhac/Reasoning-to-Defend

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/chuhac/Reasoning-to-Defend)

chuhac / Reasoning-to-Defend

[EMNLP 2025] Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking

☆12

Alternatives and similar repositories for Reasoning-to-Defend

Users that are interested in Reasoning-to-Defend are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
InvokerStark / OverKill
View on GitHub
☆15Jun 13, 2024Updated 2 years ago
zjunlp / AutoSteer
View on GitHub
[EMNLP 2025] AutoSteer: Automating Steering for Safe Multimodal Large Language Models
☆15Aug 21, 2025Updated 11 months ago
HanjiangHu / NBF-LLM
View on GitHub
The official code for "Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks".
☆18Jun 24, 2026Updated last month
pandazzh2020 / ExTES
View on GitHub
☆19Jun 4, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
yuki-younai / MTSA
View on GitHub
offical implementation of MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
☆16Jun 2, 2025Updated last year
ethz-spylab / jailbreak-tax
View on GitHub
☆24Feb 17, 2026Updated 5 months ago
yuplin2333 / representation-space-jailbreak
View on GitHub
Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…
☆24Jul 26, 2024Updated last year
kangmintong / R-2-Guard
View on GitHub
[ICLR 2025] Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
☆23Jul 8, 2024Updated 2 years ago
thu-ml / STAIR
View on GitHub
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆89Feb 26, 2025Updated last year
yuelinan / Awesome-Efficient-R1-style-LRMs
View on GitHub
☆53Jul 12, 2026Updated last week
MurrayTom / SG-Bench
View on GitHub
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
☆26Nov 29, 2024Updated last year
ybwang119 / Awesome-reasoning-safety
View on GitHub
This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL
☆66Sep 5, 2025Updated 10 months ago
Jinxiaolong1129 / Foot-in-the-door-Jailbreak
View on GitHub
☆23May 14, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
dukeceicenter / jailbreak-reasoning-openai-o1o3-deepseek-r1
View on GitHub
☆121Apr 27, 2025Updated last year
wwh0411 / FedMABench
View on GitHub
[EMNLP 2025 Main Oral] FedMABench: Benchmarking Mobile GUI Agents on Decentralized Heterogeneous User Data.
☆16Nov 11, 2025Updated 8 months ago
iliaishacked / sponge_examples
View on GitHub
☆35Oct 14, 2021Updated 4 years ago
Sautenich / UAV-CodeAgents
View on GitHub
The repo for code, that hasn't been published yet
☆14May 14, 2025Updated last year
RPIDIAL / BI-Mamba
View on GitHub
Source code of BI-Mamba for cardiovascular disease detection from two-view chest X-rays
☆15Dec 10, 2025Updated 7 months ago
UbiquantAI / IDO
View on GitHub
Turn every moment into momentum
☆22Jun 1, 2026Updated last month
zhxieml / remiss-jailbreak
View on GitHub
☆33Jun 24, 2024Updated 2 years ago
NY1024 / RACE
View on GitHub
☆27Mar 17, 2025Updated last year
OSU-NLP-Group / AmpleGCG
View on GitHub
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆87Nov 3, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ZZR0 / CodeAttack
View on GitHub
Adversarial Attack for Pre-trained Code Models
☆10Jul 19, 2022Updated 4 years ago
wonderNefelibata / Awesome-LRM-Safety
View on GitHub
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆84Updated this week
UCSC-VLAA / STAR-1
View on GitHub
[AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data
☆38Apr 7, 2025Updated last year
huizhang-L / CodeChameleon
View on GitHub
☆30Mar 20, 2024Updated 2 years ago
PKU-YuanGroup / AsFT
View on GitHub
Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".
☆37Jul 10, 2025Updated last year
UCSB-AI / MSSBench
View on GitHub
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
☆36Jun 23, 2025Updated last year
CraigMyles / cggm-mammography-classification
View on GitHub
Chinese Mammography Database (CMMD dataset) Deep Learning Classification Pipeline
☆16Mar 15, 2022Updated 4 years ago
listen0425 / Safety-Layers
View on GitHub
code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)
☆25Apr 26, 2025Updated last year
piaohongming / Powder
View on GitHub
☆21Jun 17, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
KanghoonYoon / torch-rasgg
View on GitHub
This is anonymous repository for submitting our work to a conference
☆14Dec 17, 2024Updated last year
acl-org / acl-2025
View on GitHub
☆15Aug 7, 2025Updated 11 months ago
dmhyun / MSRP
View on GitHub
Official repository of Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization [EMNLP'22 …
☆10May 20, 2023Updated 3 years ago
YancyKahn / CoA
View on GitHub
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆39Jan 17, 2025Updated last year
intelligent-soft-robots / learning_table_tennis_from_scratch
View on GitHub
☆18Jun 17, 2026Updated last month
ybwang119 / label_recovery
View on GitHub
[ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks
☆14Feb 6, 2024Updated 2 years ago
SaFo-Lab / JailBreakV_28K
View on GitHub
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆96May 9, 2025Updated last year