Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)
โ18Oct 22, 2024Updated last year
Alternatives and similar repositories for RLbreaker
Users that are interested in RLbreaker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- โ63Aug 11, 2024Updated last year
- ๐ฎ A configurable Breakout environment for reinforcement learningโ11Mar 20, 2018Updated 8 years ago
- A Unified Benchmark and Toolbox for Multimodal Jailbreak AttackโDefense Evaluationโ68Mar 2, 2026Updated 2 months ago
- โ14Jan 4, 2025Updated last year
- [CVPR 2025] Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Modelsโ12Sep 16, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI โข AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The official repository for guided jailbreak benchmarkโ29Jul 28, 2025Updated 9 months ago
- [EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agentsโ18Sep 16, 2025Updated 7 months ago
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPTโ36Oct 15, 2023Updated 2 years ago
- โ11Dec 8, 2024Updated last year
- Code for the API, workload execution, and agents underlying the LLMail-Inject Adpative Prompt Injection Challengeโ23Apr 9, 2026Updated last month
- [ICLR 2025 Spotlight] The official implementation of our ICLR2025 paper "AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration toโฆโ363Oct 8, 2025Updated 7 months ago
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMsโ12Nov 7, 2024Updated last year
- This approach of Intrusion Detection uses two GPT models, which are trained on normal network traffic, to predict sequences of communicatโฆโ11Oct 3, 2023Updated 2 years ago
- The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!โ15Apr 8, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI โข AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [NDSS'25] The official implementation of safety misalignment.โ19Jan 8, 2025Updated last year
- Implementation of the paper "Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing".โ10Feb 6, 2024Updated 2 years ago
- [ICLR 2025] Official implementation of 'Hidden in the Noise: Two-Stage Robust Watermarking for Images'โ13May 5, 2025Updated last year
- โ18Aug 15, 2022Updated 3 years ago
- [EMNLP 2025] Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreakingโ12Aug 22, 2025Updated 8 months ago
- Code library for the Tesseract framework from 'TESSERACT: Eliminating experimental bias in malware classification across space and time'โ19Dec 10, 2024Updated last year
- [ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Mโฆโ440Jan 22, 2025Updated last year
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]โ589Apr 4, 2025Updated last year
- โ14Jan 21, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting โข AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"โ18Apr 3, 2026Updated last month
- Official Implementation of wd1โ29Sep 25, 2025Updated 7 months ago
- ๐ METR: Message Enhanced Tree-Ringโ22Aug 19, 2024Updated last year
- Multi-encoder segmentation for contrail detection in satellite imagery | Google Researcโ12Jan 28, 2026Updated 3 months ago
- [ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacksโ14Feb 6, 2024Updated 2 years ago
- โ20Feb 11, 2024Updated 2 years ago
- Watermarking papersโ17Mar 31, 2026Updated last month
- โ26Mar 11, 2025Updated last year
- Official implementation for Neural networks with recurrent generative feedback (NeurIPS 2020).โ22Nov 10, 2020Updated 5 years ago
- Virtual machines for every use case on DigitalOcean โข AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- AutoHallusion Codebase (EMNLP 2024)โ22Dec 6, 2024Updated last year
- [ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".โ172May 2, 2025Updated last year
- Open leaderboard for browser agentsโ35Apr 30, 2026Updated last week
- โ22May 23, 2025Updated 11 months ago
- โ52May 24, 2023Updated 2 years ago
- Lateral Inhibition-Inspired Convolutional Neural Network for Visual Attention and Saliency Detectionโ13Nov 6, 2020Updated 5 years ago
- Pytorch implementation of gradCAM, guidedBackProp, smoothGradโ13Mar 5, 2019Updated 7 years ago