☆50Aug 3, 2024Updated last year
Alternatives and similar repositories for redteaming-resistance-benchmark
Users that are interested in redteaming-resistance-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLM red teaming datasets from the paper 'Student-Teacher Prompting for Red Teaming to Improve Guardrails' for the ART of Safety Workshop …☆26Oct 12, 2023Updated 2 years ago
- ☆16May 30, 2024Updated 2 years ago
- NVIDIA’s repository for enabling trustworthy AI.☆47Jun 3, 2026Updated 2 weeks ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆38Feb 22, 2025Updated last year
- autoredteam: code for training models that automatically red team other language models☆16Aug 9, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A curated collection of papers and related projects on using LLMs for privacy.☆32Oct 8, 2025Updated 8 months ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated last year
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆98Apr 13, 2025Updated last year
- [ACL 2024] Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation☆10May 26, 2024Updated 2 years ago
- Automated Safety Testing of Large Language Models☆18Jan 31, 2025Updated last year
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆88May 14, 2024Updated 2 years ago
- ☆47May 9, 2024Updated 2 years ago
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆11Jun 18, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- ☆15Jun 7, 2024Updated 2 years ago
- ☆27May 20, 2025Updated last year
- ☆34Sep 19, 2025Updated 9 months ago
- AutoEDA: An Automated Exploratory Data Analysis (EDA) Toolkit Simplify and automate your data exploration process with AutoEDA. This ope…☆21Nov 11, 2023Updated 2 years ago
- Official Implementation of Harnessing Perceptual Adversarial Patches for Crowd Counting (ACM CCS)☆18Apr 28, 2023Updated 3 years ago
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆359Oct 17, 2025Updated 8 months ago
- MCP easy installer is a robust mcp server with tools to search, install, configure, repair and uninstall MCP servers☆17Apr 19, 2025Updated last year
- Welcome to MitreMesh. Where MITRE's framework meets dynamic scenario generation, creating a comprehensive net of incident response tests …☆16Sep 5, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A crowd-sourced public tracker of bias audits of automated employment decision tools (AEDTs) released by employers related to NYC's Local…☆18Nov 5, 2024Updated last year
- A Security Benchmark for Claude Code Agent Skills☆59Jun 1, 2026Updated 2 weeks ago
- Open Imi is a open source claude desktop alternative for developers, engineers and tech teams to hack MCP's and agents to their own likin…☆10Nov 16, 2025Updated 7 months ago
- Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".☆12Dec 27, 2023Updated 2 years ago
- ☆10Jun 8, 2024Updated 2 years ago
- ATLAS tactics, techniques, and case studies data☆147May 27, 2026Updated 3 weeks ago
- An interactive CLI application for interacting with authenticated Jupyter instances.☆57May 7, 2025Updated last year
- Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024☆18Mar 25, 2025Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆111Mar 8, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆981Aug 16, 2024Updated last year
- PyTorch implementation of Expectation over Transformation☆13Jul 18, 2025Updated 11 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆176Mar 8, 2025Updated last year
- ☆14Mar 23, 2023Updated 3 years ago
- Sync MCP (Model Context Protocol) configurations across AI tools☆46Jun 20, 2025Updated 11 months ago
- Pretty fast parser for probabilistic context free grammars☆88Apr 17, 2013Updated 13 years ago
- Privacy backdoors☆50Apr 28, 2024Updated 2 years ago