Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"
☆23Jul 28, 2024Updated last year
Alternatives and similar repositories for RigorLLM
Users that are interested in RigorLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning☆22Jul 8, 2024Updated last year
- [COLM'24] "Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning"☆21Jun 14, 2024Updated last year
- ☆60Aug 11, 2024Updated last year
- DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems☆70Sep 29, 2024Updated last year
- Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"☆66Oct 27, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆12Sep 29, 2024Updated last year
- Source code for Truth-Aware Context Selection: Mitigating the Hallucinations of Large Language Models Being Misled by Untruthful Contexts☆17Sep 2, 2024Updated last year
- ☆10Jul 13, 2024Updated last year
- ☆39May 17, 2025Updated 10 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 9 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- Official Code Implementation for the CCS 2022 Paper "On the Privacy Risks of Cell-Based NAS Architectures"☆11Nov 21, 2022Updated 3 years ago
- Red Queen Dataset and data generation template☆26Dec 26, 2025Updated 3 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆199Jun 26, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICLR 2022] Boosting Randomized Smoothing with Variance Reduced Classifiers☆11Mar 29, 2022Updated 4 years ago
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆27Jun 11, 2025Updated 10 months ago
- [EMNLP 2024] A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models.☆21Sep 23, 2024Updated last year
- AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks☆65Jan 15, 2026Updated 2 months ago
- Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting☆20Mar 25, 2024Updated 2 years ago
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs☆12Nov 7, 2024Updated last year
- ☆10Jun 5, 2021Updated 4 years ago
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆22Mar 22, 2025Updated last year
- ☆11Dec 22, 2025Updated 3 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆97May 23, 2024Updated last year
- [NeurIPS 2023] and [ICLR 2024] for robustness certification.☆10Nov 30, 2024Updated last year
- ☆14Jan 4, 2025Updated last year
- Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'☆24Jun 9, 2024Updated last year
- An easy-to-use Python framework to defend against jailbreak prompts.☆21Mar 22, 2025Updated last year
- Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…☆72Feb 19, 2023Updated 3 years ago
- Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights☆32Jan 9, 2026Updated 3 months ago
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆16Feb 11, 2023Updated 3 years ago
- ☆25Jun 16, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code for the paper "Multi-scale Diffusion Denoised Smoothing" (NeurIPS 2023)☆15Apr 30, 2024Updated last year
- CoPur: Certifiably Robust Collaborative Inference via Feature Purification (NeurIPS 2022)☆11Dec 7, 2022Updated 3 years ago
- Official code implement of "Your Diffusion Model is Secretly a Certifiably Robust Classifier"☆18Feb 2, 2024Updated 2 years ago
- Open LLM Telemetry package☆29Nov 29, 2024Updated last year
- Code for the paper "SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness" (NeurIPS 2021)☆21Sep 27, 2022Updated 3 years ago
- Code for paper "Membership Inference Attacks Against Vision-Language Models"☆28Jan 25, 2025Updated last year
- ☆16Mar 22, 2024Updated 2 years ago