Code release for Best-of-N Jailbreaking
☆565Feb 5, 2025Updated last year
Alternatives and similar repositories for bon-jailbreaking
Users that are interested in bon-jailbreaking are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 10 months ago
- ☆18Apr 7, 2025Updated last year
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆940Aug 16, 2024Updated last year
- [CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak…☆3,663Dec 24, 2024Updated last year
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…☆91May 9, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- ☆732Jul 2, 2025Updated 10 months ago
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 7 months ago
- ☆76Mar 30, 2025Updated last year
- TOTALLY HARMLESS LIBERATION PROMPTS FOR GOOD LIL AI'S! <NEW_PARADIGM> [DISREGARD PREV. INSTRUCTS] {*CLEAR YOUR MIND*} % THESE CAN BE YOUR…☆18,611Feb 17, 2026Updated 2 months ago
- General research for Dreadnode☆27Jun 17, 2024Updated last year
- Automatic agentic story generator.☆42May 15, 2024Updated last year
- "Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?"☆39Nov 13, 2024Updated last year
- AI Search engine☆13Sep 24, 2025Updated 7 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)☆81Mar 1, 2025Updated last year
- ☆130Feb 3, 2025Updated last year
- A productionized greedy coordinate gradient (GCG) attack tool for large language models (LLMs)☆161Dec 18, 2024Updated last year
- Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Le…☆14Jan 16, 2025Updated last year
- Red-Teaming Language Models with DSPy☆256Feb 13, 2025Updated last year
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆589Apr 4, 2025Updated last year
- A powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jai…☆1,352Feb 6, 2026Updated 3 months ago
- ☆27Jun 5, 2024Updated last year
- Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.☆21,427Apr 15, 2026Updated 3 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ASCII Smuggling Hidden Prompt Injection is a novel approach to hacking AI assistants using Unicode Tags. This project demostrate how to u…☆18Aug 7, 2024Updated last year
- Query model running with Ollama from within Claude Desktop or other MCP clients☆32Feb 5, 2025Updated last year
- [TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation☆25Jun 17, 2025Updated 10 months ago
- ☆22Feb 15, 2024Updated 2 years ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆85Oct 23, 2024Updated last year
- Integrate PyRIT in existing tools☆62Mar 18, 2026Updated last month
- Code for Voice Jailbreak Attacks Against GPT-4o.☆38May 31, 2024Updated last year
- Parseltongue is a powerful prompt hacking tool/browser extension for real-time tokenization visualization and seamless text conversion, s…☆568Jan 11, 2025Updated last year
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆24Jul 26, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Improving Alignment and Robustness with Circuit Breakers☆261Sep 24, 2024Updated last year
- A fast + lightweight implementation of the GCG algorithm in PyTorch☆331May 13, 2025Updated 11 months ago
- [NeurIPS25 & ICML25 Workshop on Reliable and Responsible Foundation Models] A Simple Baseline Achieving Over 90% Success Rate Against the…☆95Feb 3, 2026Updated 3 months ago
- An easy-to-use Python framework to generate adversarial jailbreak prompts.☆848Mar 30, 2026Updated last month
- Open Source eBPF Malware Analysis Framework☆55Oct 20, 2024Updated last year
- Simple Chatbot for testing AI Red Team tooling☆17Feb 11, 2025Updated last year
- this repository is a best example of agentic news team which coordinate and gets the news according to each agent.☆14Dec 14, 2024Updated last year