AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
☆67Jan 15, 2026Updated last month
Alternatives and similar repositories for AutoDefense
Users that are interested in AutoDefense are comparing it to the libraries listed below
Sorting:
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Jul 9, 2024Updated last year
- ☆27Jun 28, 2025Updated 8 months ago
- CogNetX is an advanced, multimodal neural network architecture inspired by human cognition. It integrates speech, vision, and video proce…☆19Feb 9, 2026Updated last month
- Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)☆17Oct 22, 2024Updated last year
- LLM Self Defense: By Self Examination, LLMs know they are being tricked☆51May 21, 2024Updated last year
- [SatML 2024] Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk☆16Mar 15, 2025Updated 11 months ago
- ☆28Mar 20, 2024Updated last year
- ☆19Mar 16, 2017Updated 8 years ago
- This repository contains code for AdvEWM, as detailed in our paper published in JISA☆18Mar 3, 2026Updated last week
- ☆25Jun 16, 2024Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆151Jul 19, 2024Updated last year
- Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"☆22Jul 28, 2024Updated last year
- Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.☆56Nov 13, 2023Updated 2 years ago
- This is the official implementation of our paper 'Black-box Dataset Ownership Verification via Backdoor Watermarking'.☆26Jul 22, 2023Updated 2 years ago
- quick playground to animate pippin☆15Nov 11, 2024Updated last year
- ☆37Oct 15, 2024Updated last year
- Implementation of ECCV 2020 "Sparse Adversarial Attack via Perturbation Factorization"☆27Aug 18, 2020Updated 5 years ago
- Panda Guard is designed for researching jailbreak attacks, defenses, and evaluation algorithms for large language models (LLMs).☆65Jan 19, 2026Updated last month
- Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"☆34Aug 16, 2024Updated last year
- Data and code for Emotion Prediction Errors☆10Feb 22, 2022Updated 4 years ago
- Root Repo for the EPOXY tool that applies Privilege Overlays on bare-metal systems☆31May 18, 2017Updated 8 years ago
- Libraries, guides, blueprints, and sample code, to enable rapidly building 0-1 applications on iOS, Android and web.☆11May 12, 2023Updated 2 years ago
- ☆10Dec 9, 2019Updated 6 years ago
- Open-source AI app builder | v0 / lovable / Bolt alternative | 🌟 Star if you like it!☆15Jul 17, 2025Updated 7 months ago
- Notes about courses Machine Learning 2025 Spring by Hung-yi Lee☆25Sep 22, 2025Updated 5 months ago
- ☆57May 21, 2025Updated 9 months ago
- FGLA: Fast Generation-Based Gradient Leakage Attacks against Highly Compressed Gradients☆14Dec 20, 2022Updated 3 years ago
- ☆11Apr 6, 2019Updated 6 years ago
- A sample project for using Capstone from a driver in Visual Studio 2015☆36May 4, 2016Updated 9 years ago
- The PyTorch implementation of paper "KERMIT: Knowledge Graph Completion of Enhanced Relation Modeling with Inverse Transformation"☆15Jul 4, 2025Updated 8 months ago
- IoT CVEs as abnormal events to evaluate a real-time host-based IDS. https://doi.org/10.1016/j.future.2022.03.001☆13Mar 16, 2022Updated 3 years ago
- Implementation for paper2repo☆11Dec 7, 2020Updated 5 years ago
- A functional programming library for Python☆17Dec 22, 2025Updated 2 months ago
- Failsafe value retrieval, modification and utils using json-pointer spec☆14Updated this week
- A simple yet powerful data validator for javascript.☆12Jan 7, 2023Updated 3 years ago
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆44Mar 20, 2024Updated last year
- [EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents☆16Sep 16, 2025Updated 5 months ago
- ☆12Dec 21, 2024Updated last year
- Development repository for the Digital Terraria Lab implementation of the Sugarscape agent-based societal simulation.☆15Feb 24, 2026Updated last week