haizelabs/redteaming-resistance-benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/haizelabs/redteaming-resistance-benchmark)

haizelabs / redteaming-resistance-benchmark

☆50

Alternatives and similar repositories for redteaming-resistance-benchmark

Users that are interested in redteaming-resistance-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / Trustworthy-AI
View on GitHub
NVIDIA’s repository for enabling trustworthy AI.
☆52Jun 30, 2026Updated last week
zzwjames / FailureLLMUnlearning
View on GitHub
An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)
☆39Feb 22, 2025Updated last year
leondz / autoredteam
View on GitHub
autoredteam: code for training models that automatically red team other language models
☆17Aug 9, 2023Updated 2 years ago
dzungvpham / awesome-llm4privacy
View on GitHub
A curated collection of papers and related projects on using LLMs for privacy.
☆34Oct 8, 2025Updated 9 months ago
fiveai / understanding_safety_finetuning
View on GitHub
Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)
☆12Oct 31, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AI-secure / Knowledge-Enhanced-Machine-Learning-Pipeline
View on GitHub
Repository for Knowledge Enhanced Machine Learning Pipeline (KEMLP)
☆10Jun 5, 2021Updated 5 years ago
haizelabs / thorn-in-haizestack
View on GitHub
Thorn in a HaizeStack test for evaluating long-context adversarial robustness.
☆26Aug 3, 2024Updated last year
leondz / twokenize
View on GitHub
Python standalone tokenizer
☆14Nov 12, 2015Updated 10 years ago
yiksiu-chan / SpeakEasy
View on GitHub
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
☆14Mar 7, 2026Updated 4 months ago
haizelabs / get-haized
View on GitHub
A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.
☆99Apr 13, 2025Updated last year
spudsdude / XBMC-Flicks
View on GitHub
Netflix for XBMC
☆61Nov 13, 2012Updated 13 years ago
kangmintong / R-2-Guard
View on GitHub
[ICLR 2025] Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
☆23Jul 8, 2024Updated 2 years ago
aypan17 / reward-misspecification
View on GitHub
☆10Mar 13, 2023Updated 3 years ago
Trust4AI / ASTRAL
View on GitHub
Automated Safety Testing of Large Language Models
☆17Jan 31, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Aatrox103 / SAP
View on GitHub
☆48May 9, 2024Updated 2 years ago
lapisrocks / DiscreteAdversarialDistillation
View on GitHub
[NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"
☆11Jun 18, 2024Updated 2 years ago
nsiregar / pegelinux
View on GitHub
Blog aggregator for pegelinux community
☆10Dec 9, 2022Updated 3 years ago
kevinyaobytedance / llm_eval
View on GitHub
LLM evaluation.
☆16Nov 7, 2023Updated 2 years ago
humane-intelligence / ai_village_defcon_grt_data
View on GitHub
☆15Jun 7, 2024Updated 2 years ago
arschmitz / spider.js
View on GitHub
A command line tool for crawling a webstite for dead links, permeant and or fatal redirects, resource load issues, and script errors. It…
☆12Apr 16, 2023Updated 3 years ago
Mercor-Intelligence / apex-swe
View on GitHub
☆80Jun 25, 2026Updated 2 weeks ago
dreadnode / parley
View on GitHub
Tree of Attacks (TAP) Jailbreaking Implementation
☆120Feb 7, 2024Updated 2 years ago
pushshift / Parallel-NDJSON-Reader
View on GitHub
Parallel NDJSON Reader for Python
☆17Dec 4, 2019Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
shunchang-liu / PAP-Pytorch
View on GitHub
Official Implementation of Harnessing Perceptual Adversarial Patches for Crowd Counting (ACM CCS)
☆18Apr 28, 2023Updated 3 years ago
StirlingGoetz / a0pentester
View on GitHub
Agent Zero (agent-zero.ai) extensions for ethical penetration testing
☆23Sep 10, 2025Updated 9 months ago
CHATS-lab / persuasive_jailbreaker
View on GitHub
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
☆362Oct 17, 2025Updated 8 months ago
aclu-national / tracking-ll144-bias-audits
View on GitHub
A crowd-sourced public tracker of bias audits of automated employment decision tools (AEDTs) released by employers related to NYC's Local…
☆18Nov 5, 2024Updated last year
eltociear / MolCA
View on GitHub
Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".
☆12Dec 27, 2023Updated 2 years ago
bam098 / deep_knn
View on GitHub
PyTorch Implementation of the Deep k-Nearest-Neighbors algorithm, https://arxiv.org/abs/1803.04765
☆16Aug 18, 2020Updated 5 years ago
ivanleomk / build-hackathon-rag-ws
View on GitHub
☆10Jun 8, 2024Updated 2 years ago
mitre-atlas / atlas-data
View on GitHub
ATLAS tactics, techniques, and case studies data
☆153Jun 30, 2026Updated last week
BatsResearch / cross-lingual-detox
View on GitHub
Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024
☆18Mar 25, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
centerforaisafety / HarmBench
View on GitHub
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
☆997Aug 16, 2024Updated last year
protectskills / MaliciousAgentSkillsBench
View on GitHub
A Security Benchmark for Claude Code Agent Skills
☆68Jun 17, 2026Updated 3 weeks ago
mlukasik / rumour-classification
View on GitHub
Code to reproduce experiments from the EMNLP 2015 paper about Rumour Stance Classification with Gaussian Processes.
☆37May 23, 2016Updated 10 years ago
OpenSafetyLab / SALAD-BENCH
View on GitHub
【ACL 2024】 SALAD benchmark & MD-Judge
☆176Mar 8, 2025Updated last year
shiyuchengTJU / PAR
View on GitHub
☆14Mar 23, 2023Updated 3 years ago
wavii / pfp
View on GitHub
Pretty fast parser for probabilistic context free grammars
☆88Apr 17, 2013Updated 13 years ago
maciejkula / dynarray
View on GitHub
Dynamic Numpy arrays
☆13Feb 26, 2017Updated 9 years ago