xunguangwang/SoK4JailbreakGuardrails

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xunguangwang/SoK4JailbreakGuardrails)

xunguangwang / SoK4JailbreakGuardrails

[S&P 2026] SoK: Evaluating Jailbreak Guardrails for Large Language Models

☆43

Alternatives and similar repositories for SoK4JailbreakGuardrails

Users that are interested in SoK4JailbreakGuardrails are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

selfdefend / Code
View on GitHub
☆34Jan 26, 2025Updated last year
khhung-906 / Attention-Tracker
View on GitHub
Code for our NAACL2025 accepted paper: Attention Tracker: Detecting Prompt Injection Attacks in LLMs
☆28Sep 19, 2025Updated 9 months ago
AAAAAAsuka / llm_defends
View on GitHub
code of paper "Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM"
☆14Nov 17, 2023Updated 2 years ago
zshuai8 / FedGMM_ICML2023
View on GitHub
Personalized Federated Learning under Mixture of Distributions
☆19Jan 2, 2024Updated 2 years ago
MadryLab / AT2
View on GitHub
Attribute statements generated by LLMs to preceding tokens using attention weights.
☆27Apr 22, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
facebookresearch / wasp
View on GitHub
Official implementation of the WASP web agent security benchmark
☆97Apr 13, 2026Updated 2 months ago
rgb91 / temporal-deepfake-segmentation
View on GitHub
Transformer Model to detect deepfakes from popular datasets. Predictions made on embeddings (features) generated by a different ViT model…
☆14Nov 27, 2023Updated 2 years ago
parasgulati8 / NinaPro-Helper-Library
View on GitHub
A helper library to perform essential manipulations on time-series EMG data of NinaPro DB2
☆12Mar 21, 2020Updated 6 years ago
Yuanyuan-Yuan / CipherSteal
View on GitHub
☆16Sep 17, 2024Updated last year
tsagkas / MyoUP_dataset
View on GitHub
A custom sEMG dataset developed with the Myo Armband for hand gesture classification
☆44Apr 20, 2020Updated 6 years ago
BoZhuBo / SeNic
View on GitHub
☆25Jan 11, 2023Updated 3 years ago
Lif3line / nina_helper_package_mk2
View on GitHub
Python functions and important data for working on NinaPro database 1 & 2
☆21Dec 27, 2022Updated 3 years ago
tonychenxyz / selfie
View on GitHub
This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…
☆58Dec 9, 2024Updated last year
Cecile-hi / Regularized-Adaptive-Weight-Modification
View on GitHub
Continual Learning Method RAWM for ICML 2023
☆23Sep 26, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Bob-cheng / DepthModelHardening
View on GitHub
Official PyTorch implementation of our paper "Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World A…
☆11Feb 8, 2023Updated 3 years ago
compsec-snu / pfi
View on GitHub
PFI: Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents
☆30Mar 26, 2025Updated last year
aaFrostnova / Papillon
View on GitHub
[Usenix Security 2025] Official repo of paper PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
☆69Nov 17, 2025Updated 7 months ago
ucsb-mlsec / Awesome-Agent-Security
View on GitHub
☆58Jun 24, 2026Updated 2 weeks ago
Ruiyang-061X / Awesome-MLLM-Reasoning
View on GitHub
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
☆13Feb 7, 2025Updated last year
VimalWill / Vstream
View on GitHub
Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)
☆10Feb 2, 2024Updated 2 years ago
AI-secure / AgentPoison
View on GitHub
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆228Jun 17, 2026Updated 3 weeks ago
HydroXai / Enhancing-Safety-in-Large-Language-Models
View on GitHub
Precision Knowledge Editing (PKE): A novel method to reduce toxicity in LLMs while preserving performance, with robust evaluations and ha…
☆11Nov 26, 2024Updated last year
PRIS-CV / MSSRM
View on GitHub
An implementation of MSSRM method
☆10Mar 23, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
BillChan226 / SafeWatch
View on GitHub
[ICLR 2025] Official implementation for "SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanati…
☆45Feb 11, 2025Updated last year
Suguru55 / Wearable_Sensor_Long-term_sEMG_Dataset
View on GitHub
published in Biomedical Signal Processing and Control
☆12May 16, 2020Updated 6 years ago
jzhang538 / CorruptEncoder
View on GitHub
[CVPR 2024] "Data Poisoning based Backdoor Attacks to Contrastive Learning": official code implementation.
☆16Feb 10, 2025Updated last year
hieulem / ADNET_demo
View on GitHub
☆11Nov 30, 2018Updated 7 years ago
DexterJZ / eval_driving_safety
View on GitHub
Evaluating Adversarial Attacks on Driving Safety in Vision-Based Autonomous Vehicles
☆20Jul 26, 2023Updated 2 years ago
pmixer / otb.plugin
View on GitHub
Bridging python and matlab version OTB
☆11Jan 12, 2018Updated 8 years ago
cqu20160901 / DETR_onnx_tensorRT_V2
View on GitHub
DETR tensor去除推理过程无用辅助头+fp16部署再次加速+解决转tensorrt 输出全为0问题的新方法。
☆11Jan 9, 2024Updated 2 years ago
sachink1729 / SQL-Agents-Using-RAG-DSPy-Groq
View on GitHub
Exploring advanced prompting tools to query SQL database with multiple tables in natural language using LLMs
☆16Aug 23, 2024Updated last year
ielab / FeB4RAG
View on GitHub
This repo contains information about FeB4RAG collection
☆17Feb 19, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ASTRAL-Group / ASTRA
View on GitHub
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…
☆61Jul 5, 2025Updated last year
MidiyaZhu / MePO
View on GitHub
Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization
☆13Jan 12, 2026Updated 5 months ago
TASER2023 / TASER
View on GitHub
☆14Apr 6, 2025Updated last year
Greysahy / ipiguard
View on GitHub
[EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents
☆22Sep 16, 2025Updated 9 months ago
Linzwcs / AFT
View on GitHub
☆13Jan 22, 2025Updated last year
SangitaPokhrel911 / PDF-Summarizer-End-to-End-Project
View on GitHub
☆12Aug 3, 2024Updated last year
SyedUmaidAhmed / Federated-and-Split-Learning-on-Edge
View on GitHub
This is the official implementation of Federated and Split Learning on multiple Raspberry Pi's. It is the demonstration of training the d…
☆13Nov 16, 2022Updated 3 years ago