kangmintong/R-2-Guard

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kangmintong/R-2-Guard)

kangmintong / R-2-Guard

[ICLR 2025] Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

☆23

Alternatives and similar repositories for R-2-Guard

Users that are interested in R-2-Guard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SalesforceAIResearch / BingoGuard
View on GitHub
☆15Jun 2, 2026Updated last month
thefcraft / prompt-generator-stable-diffusion
View on GitHub
Prompt Generator model for Stable Diffusion Models
☆12Jun 20, 2023Updated 3 years ago
zhipeng-wei / EmojiAttack
View on GitHub
Emoji Attack [ICML 2025]
☆46Jul 15, 2025Updated last year
epfml / pam
View on GitHub
☆16Dec 9, 2023Updated 2 years ago
chuhac / Reasoning-to-Defend
View on GitHub
[EMNLP 2025] Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
☆12Aug 22, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TeamPigeonLab / CS-DJ
View on GitHub
Accept by CVPR 2025 (highlight)
☆25Jun 8, 2025Updated last year
AmenRa / GuardBench
View on GitHub
A Python library for guardrail models evaluation.
☆37Oct 9, 2025Updated 9 months ago
egochao / transformer_with_einsum
View on GitHub
Transformer from scratch with einsum method
☆11Jul 8, 2021Updated 5 years ago
jiaxiaojunQAQ / FP-Better
View on GitHub
Code for Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks (TIFS2024)
☆13Mar 29, 2024Updated 2 years ago
OSU-NLP-Group / EIA_against_webagent
View on GitHub
☆40Oct 2, 2024Updated last year
Jinxiaolong1129 / Foot-in-the-door-Jailbreak
View on GitHub
☆23May 14, 2025Updated last year
ydc123 / MMP-Attack
View on GitHub
Official repository for "On the Multi-modal Vulnerability of Diffusion Models"
☆17Jul 15, 2024Updated 2 years ago
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
View on GitHub
☆80Mar 30, 2025Updated last year
ml-research / LlavaGuard
View on GitHub
☆71Sep 30, 2025Updated 9 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ml-research / SLASH
View on GitHub
Scalable Neural-Probabilistic Answer Set Programming
☆18May 23, 2024Updated 2 years ago
yiksiu-chan / SpeakEasy
View on GitHub
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
☆14Mar 7, 2026Updated 4 months ago
cambridge-mlg / RAT-SPN
View on GitHub
Code for UAI'19: Random Sum-Product Networks: A Simple and Effective Approach to Probabilistic Deep Learning
☆38Jun 7, 2020Updated 6 years ago
thu-coai / Agent-SafetyBench
View on GitHub
☆149Aug 11, 2025Updated 11 months ago
gabegrand / battleship
View on GitHub
Official repo for Shoot First, Ask Questions Later?
☆24Apr 23, 2026Updated 3 months ago
pkulcwmzx / knowledge-boundary
View on GitHub
[ACL 2024] Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation
☆10May 26, 2024Updated 2 years ago
yueliu1999 / GuardReasoner
View on GitHub
[ICLR Workshop 2025] An official source code for paper "GuardReasoner: Towards Reasoning-based LLM Safeguards".
☆175May 19, 2025Updated last year
tmllab / 2025_ICLR_PiF
View on GitHub
☆40May 17, 2025Updated last year
eurekayuan / RigorLLM
View on GitHub
Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"
☆24Jul 28, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
HashmatShadab / Robust-LLaVA
View on GitHub
[ICCVW 2025 (Oral)] Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
☆29Oct 20, 2025Updated 9 months ago
MaTengSYSU / HIMRD-jailbreak
View on GitHub
Code repository for the paper "Heuristic Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models"
☆19Aug 7, 2025Updated 11 months ago
MinhZou / selective-copying-mamba
View on GitHub
Selective Copying Task with Mamba Model. This repository contains a simple implementation for reproducing the selective copying task with…
☆14Jun 3, 2024Updated 2 years ago
ejones313 / roben
View on GitHub
☆12Mar 7, 2021Updated 5 years ago
roywang021 / UMK
View on GitHub
Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models
☆34Dec 30, 2024Updated last year
CryptoAILab / FigStep
View on GitHub
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆211Jun 26, 2025Updated last year
thu-coai / ShieldLM
View on GitHub
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
☆231Sep 29, 2024Updated last year
alenai97 / PEFT-MLLM
View on GitHub
Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"
☆25Nov 10, 2024Updated last year
allenai / wildguard
View on GitHub
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆131Dec 2, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
chanchimin / AgentMonitor
View on GitHub
Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"
☆13Dec 13, 2024Updated last year
AI-secure / AdvAgent
View on GitHub
☆25May 28, 2025Updated last year
SaFo-Lab / AdaShield
View on GitHub
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…
☆73Feb 9, 2026Updated 5 months ago
naver-ai / JOOD
View on GitHub
[CVPR 2025] Official implementation for JOOD "Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy"
☆21Jun 11, 2025Updated last year
UCSB-AI / MSSBench
View on GitHub
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
☆36Jun 23, 2025Updated last year
ZixuanNi / Mod-X
View on GitHub
The reproduce of paper "Continual Vision-Language Representation Learning with Off-Diagonal Information ".(Mod-X)
☆12Oct 31, 2023Updated 2 years ago
limenlp / safer-instruct
View on GitHub
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Feb 22, 2024Updated 2 years ago