nwinter / ultimate-jailbreaking-championshipLinks
Playing around with various jailbreaking techniques ahead of the Gray Swan AI Ultimate Jailbreaking Competition
☆18Updated last year
Alternatives and similar repositories for ultimate-jailbreaking-championship
Users that are interested in ultimate-jailbreaking-championship are comparing it to the libraries listed below
Sorting:
- An Open-source Factuality Evaluation Demo for LLMs☆19Updated 5 months ago
- [ICLR 2025] This repository contains the code to reproduce the results from our paper From Sparse Dependence to Sparse Attention: Unveili…☆11Updated 10 months ago
- ☆13Updated last year
- A fast + lightweight implementation of the GCG algorithm in PyTorch☆310Updated 7 months ago
- ☆676Updated 6 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆373Updated 11 months ago
- Jailbreak artifacts for JailbreakBench☆75Updated last year
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆830Updated last year
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆506Updated 9 months ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆400Updated last month
- TAP: An automated jailbreaking method for black-box LLMs☆214Updated last year
- Papers about red teaming LLMs and Multimodal models.☆159Updated 7 months ago
- ☆114Updated 2 years ago
- Improving Alignment and Robustness with Circuit Breakers☆252Updated last year
- The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.☆13Updated last year
- ☆11Updated last year
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆56Updated last year
- ☆10Updated 9 months ago
- ☆26Updated 8 months ago
- ☆14Updated 11 months ago
- ☆24Updated 10 months ago
- ☆27Updated last month
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆77Updated 11 months ago
- NestJS project template, configured with prisma and ejs☆12Updated last year
- ☆75Updated last year
- Fluent student-teacher redteaming☆23Updated last year
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆67Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆179Updated 9 months ago
- This repository provides a benchmark for prompt injection attacks and defenses in LLMs☆373Updated 2 months ago
- [ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`☆90Updated 4 months ago