eurekayuan/RigorLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/eurekayuan/RigorLLM)

eurekayuan / RigorLLM

Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"

☆24

Alternatives and similar repositories for RigorLLM

Users that are interested in RigorLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

luka-group / vlm-knowledge-conflict
View on GitHub
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆54Oct 19, 2024Updated last year
OSU-NLP-Group / Deductive-Beam-Search
View on GitHub
[COLM'24] "Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning"
☆21Jun 14, 2024Updated 2 years ago
multimodal-art-projection / I-SHEEP
View on GitHub
I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment
☆17Jan 16, 2025Updated last year
xyq7 / GradSafe
View on GitHub
Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"
☆68Oct 27, 2024Updated last year
Victorwz / LaViA
View on GitHub
☆10Jul 13, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
SaFo-Lab / JailBreakV_28K
View on GitHub
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆96May 9, 2025Updated last year
GAIR-NLP / Safety-J
View on GitHub
Safety-J: Evaluating Safety with Critique
☆16Jul 28, 2024Updated last year
kangmintong / R-2-Guard
View on GitHub
[ICLR 2025] Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
☆23Jul 8, 2024Updated 2 years ago
STAIR-BUPT / STAIR-LLMGuardrails
View on GitHub
☆12Sep 29, 2024Updated last year
abc03570128 / Jailbreaking-Attack-against-Multimodal-Large-Language-Model
View on GitHub
☆63Aug 11, 2024Updated last year
mila-iqia / Casande-RL
View on GitHub
Casande-RL
☆11May 9, 2023Updated 3 years ago
Junjie-Ye / ToolSword
View on GitHub
[ACL 2024] ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
☆15Sep 12, 2024Updated last year
MiracleHH / nas_privacy
View on GitHub
Official Code Implementation for the CCS 2022 Paper "On the Privacy Risks of Cell-Based NAS Architectures"
☆11Nov 21, 2022Updated 3 years ago
ToolBeHonest / ToolBeHonest
View on GitHub
[EMNLP 2024] A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models.
☆22Sep 23, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
eth-sri / smoothing-ensembles
View on GitHub
[ICLR 2022] Boosting Randomized Smoothing with Variance Reduced Classifiers
☆11Mar 29, 2022Updated 4 years ago
Anni-Zou / DocBench
View on GitHub
DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems
☆80Sep 29, 2024Updated last year
flageval-baai / HalluDial
View on GitHub
☆21Aug 19, 2024Updated last year
theshi-1128 / llm-defense
View on GitHub
An easy-to-use Python framework to defend against jailbreak prompts.
☆21Mar 22, 2025Updated last year
XHMY / AutoDefense
View on GitHub
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
☆68Jan 15, 2026Updated 6 months ago
theshi-1128 / jailbreak-bench
View on GitHub
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆21Mar 22, 2025Updated last year
xlhex / extract_and_transfer
View on GitHub
☆10Jun 5, 2021Updated 5 years ago
alphadl / SafeLLM_with_IntentionAnalysis
View on GitHub
Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting
☆21Mar 25, 2024Updated 2 years ago
qizhangli / Gradient-based-Jailbreak-Attacks
View on GitHub
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Nov 7, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
SafeAILab / RAIN
View on GitHub
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆99May 23, 2024Updated 2 years ago
hukkai / liresnet
View on GitHub
[NeurIPS 2023] and [ICLR 2024] for robustness certification.
☆10Nov 30, 2024Updated last year
UCSB-NLP-Chang / SemanticSmooth
View on GitHub
Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'
☆24Jun 9, 2024Updated 2 years ago
thunlp / Advbench
View on GitHub
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…
☆80Feb 19, 2023Updated 3 years ago
jh-jeong / smoothing-multiscale
View on GitHub
Code for the paper "Multi-scale Diffusion Denoised Smoothing" (NeurIPS 2023)
☆15Apr 30, 2024Updated 2 years ago
uwFengyuan / OCC-CLIP
View on GitHub
☆14Jan 4, 2025Updated last year
tmllab / 2025_ICLR_PiF
View on GitHub
☆40May 17, 2025Updated last year
huanranchen / NoisedDiffusionClassifiers
View on GitHub
Official code implement of "Your Diffusion Model is Secretly a Certifiably Robust Classifier"
☆18Feb 2, 2024Updated 2 years ago
xiye17 / TextualExplInContext
View on GitHub
The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)
☆16Feb 11, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ledllm / ledllm
View on GitHub
☆24Jun 16, 2024Updated 2 years ago
AI-secure / CoPur
View on GitHub
CoPur: Certifiably Robust Collaborative Inference via Feature Purification (NeurIPS 2022)
☆11Dec 7, 2022Updated 3 years ago
CryptoAPI-Bench / CryptoAPI-Bench
View on GitHub
☆14Jan 10, 2024Updated 2 years ago
theoxo / self-repair
View on GitHub
[ICLR 2024]: Is Self-Repair a Silver Bullet for Code Generation?
☆15May 2, 2024Updated 2 years ago
erfanshayegani / Jailbreak-In-Pieces
View on GitHub
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…
☆93Jun 6, 2024Updated 2 years ago
tmbdev-archive / webdataset-imagenet-2
View on GitHub
A small repository demonstrating the use of Webdataset and Imagenet
☆17Dec 19, 2023Updated 2 years ago
yuqiChen94 / Swat_Simulator
View on GitHub
☆14Dec 27, 2020Updated 5 years ago