andyzoujm / breaking-llama-guardView external linksLinks
Code to break Llama Guard
☆32Dec 7, 2023Updated 2 years ago
Alternatives and similar repositories for breaking-llama-guard
Users that are interested in breaking-llama-guard are comparing it to the libraries listed below
Sorting:
- Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)☆75Mar 1, 2025Updated 11 months ago
- ACL24☆11Jun 7, 2024Updated last year
- ☆10Oct 31, 2022Updated 3 years ago
- ☆14Dec 27, 2020Updated 5 years ago
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 7 months ago
- Code for our paper "Localizing Lying in Llama"☆13Apr 24, 2025Updated 9 months ago
- ☆35May 21, 2025Updated 8 months ago
- Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms☆13Jan 11, 2022Updated 4 years ago
- Repository for "StrongREJECT for Empty Jailbreaks" paper☆151Nov 3, 2024Updated last year
- Forcing Diffuse Distributions out of Language Models☆18Sep 10, 2024Updated last year
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆173Apr 23, 2025Updated 9 months ago
- Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]☆21Apr 15, 2024Updated last year
- A network-level collaboration framework for personal mobile devices☆15Jun 24, 2020Updated 5 years ago
- ☆47Sep 29, 2024Updated last year
- ☆44Oct 1, 2024Updated last year
- Official Repository for Dataset Inference for LLMs☆43Jul 25, 2024Updated last year
- The library for symbolic interval☆22Jun 23, 2020Updated 5 years ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆127Feb 24, 2025Updated 11 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Apr 28, 2024Updated last year
- Source code of "What can linearized neural networks actually say about generalization?☆20Oct 21, 2021Updated 4 years ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆89Mar 30, 2025Updated 10 months ago
- TheNZT is a powerful multi-agent finance query processing system designed to process and respond to finance-related queries efficiently. …☆30Feb 3, 2026Updated last week
- This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.☆90May 19, 2024Updated last year
- ☆21Oct 9, 2020Updated 5 years ago
- "Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers" (NeurIPS 2019, previously called "A Stratified Approach …☆17Nov 16, 2019Updated 6 years ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆93May 9, 2024Updated last year
- Data for "Datamodels: Predicting Predictions with Training Data"☆97May 25, 2023Updated 2 years ago
- Reimplementation of the WeFDE information leakage analysis technique for website fingerprinting analysis in Python3.☆23Oct 30, 2020Updated 5 years ago
- Official repo for the paper "Make Some Noise: Reliable and Efficient Single-Step Adversarial Training" (https://arxiv.org/abs/2202.01181)☆25Oct 17, 2022Updated 3 years ago
- Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models☆37Jun 1, 2025Updated 8 months ago
- Language models scale reliably with over-training and on downstream tasks☆99Apr 2, 2024Updated last year
- Tools for the CADCD dataset☆24Aug 30, 2019Updated 6 years ago
- Implementation of "Website Fingerprinting at Internet Scale"☆23Feb 24, 2023Updated 2 years ago
- The repository contains code for Adaptive Data Optimization☆32Dec 9, 2024Updated last year
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆338Feb 23, 2024Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆108Mar 8, 2024Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆32Jan 23, 2025Updated last year
- ☆193Nov 26, 2023Updated 2 years ago
- β-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Neural Network Verification☆31Nov 9, 2021Updated 4 years ago