XHMY/AutoDefense

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/XHMY/AutoDefense)

XHMY / AutoDefense

AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks

☆68

Alternatives and similar repositories for AutoDefense

Users that are interested in AutoDefense are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

poloclub / llm-self-defense
View on GitHub
LLM Self Defense: By Self Examination, LLMs know they are being tricked
☆52May 21, 2024Updated 2 years ago
thu-coai / JailbreakDefense_GoalPriority
View on GitHub
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Jul 9, 2024Updated 2 years ago
yjw1029 / Self-Reminder
View on GitHub
Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.
☆57Nov 13, 2023Updated 2 years ago
eurekayuan / RigorLLM
View on GitHub
Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"
☆24Jul 28, 2024Updated last year
UCSB-NLP-Chang / SemanticSmooth
View on GitHub
Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'
☆24Jun 9, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ledllm / ledllm
View on GitHub
☆24Jun 16, 2024Updated 2 years ago
VITA-Group / Shake-to-Leak
View on GitHub
[SatML 2024] Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk
☆16Mar 15, 2025Updated last year
YihanWang617 / LLM-Jailbreaking-Defense-Backtranslation
View on GitHub
Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"
☆34Aug 16, 2024Updated last year
Coding-Crashkurse / LangChain-EventDriven-Architecture
View on GitHub
☆13Jul 15, 2023Updated 3 years ago
kyegomez / CogNetX
View on GitHub
CogNetX is an advanced, multimodal neural network architecture inspired by human cognition. It integrates speech, vision, and video proce…
☆20Updated this week
THUYimingLi / DVBW
View on GitHub
This is the official implementation of our paper 'Black-box Dataset Ownership Verification via Backdoor Watermarking'.
☆29May 1, 2026Updated 2 months ago
Jometeorie / KnowledgeSpread
View on GitHub
☆40Oct 15, 2024Updated last year
uw-nsl / SafeDecoding
View on GitHub
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆154Jul 19, 2024Updated 2 years ago
yzmar4real / ai_cybersecurity_compliance
View on GitHub
AI-Powered CyberSecurity Compliance: Boost Network Security with OpenAI GPT-3.5-turbo
☆10May 18, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
echohive42 / GPT-adversarial-defense
View on GitHub
This Repo focuses on defending against 'adversarial prompts,' detecting and attempting to mitigate objectionable content in real time.
☆14Jul 30, 2023Updated 2 years ago
lancopku / agent-backdoor-attacks
View on GitHub
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆116Sep 27, 2024Updated last year
hychaochao / Chat-Models-Backdoor-Attacking
View on GitHub
Code for the paper "Exploring Backdoor Vulnerabilities of Chat Models"
☆19Apr 13, 2024Updated 2 years ago
Le0v1n / KnowledgeHub
View on GitHub
🚀Knowledge All in One
☆10Jun 11, 2026Updated last month
SamuelGong / grad_attacks
View on GitHub
Self-Teaching Notes on Gradient Leakage Attacks against GPT-2 models.
☆14Mar 18, 2024Updated 2 years ago
HengruiLou / CHN-DF
View on GitHub
面向人脸视频防伪鉴别的大规模中文数据评测基准(Large-Scale Chinese Data Benchmark for Face Video Anti-Forgery Identification)
☆13Feb 26, 2025Updated last year
anakin87 / llms4devs
View on GitHub
Conference talk: from zero to your first LLM application
☆17Jul 1, 2024Updated 2 years ago
ZiyueWang25 / llm-security-challenge
View on GitHub
Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the Over…
☆13Aug 21, 2023Updated 2 years ago
Coding-Crashkurse / RAG-Evaluation-with-Ragas
View on GitHub
☆19Mar 2, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
AI4Good24 / PsySafe
View on GitHub
☆53Feb 8, 2025Updated last year
pigeon-dove / FGLA
View on GitHub
FGLA: Fast Generation-Based Gradient Leakage Attacks against Highly Compressed Gradients
☆15Mar 17, 2026Updated 4 months ago
InvokerStark / OverKill
View on GitHub
☆15Jun 13, 2024Updated 2 years ago
ChengshuaiZhao0 / The-Wolf-Within
View on GitHub
☆13Jul 16, 2026Updated last week
SEU-WDS / MachineLearningCourses
View on GitHub
暑期机器学习讨论班是由张祥老师组织发起，全体学生参与的讨论交流活动。目的是让学生巩固机器学习基本算法，掌握基本原理和使用。组织形式为学生选题并制作PPT，采用演讲的形式授课给全体参与学生和导师。
☆10Sep 19, 2018Updated 7 years ago
anniesch / single-life-rl
View on GitHub
Single-Life Reinforcement Learning
☆14Dec 17, 2022Updated 3 years ago
wslong20 / G-safeguard
View on GitHub
☆42Jun 28, 2025Updated last year
shiningrain / JailGuard
View on GitHub
☆32Mar 16, 2025Updated last year
Y-Xiang-hub / AdvEWM
View on GitHub
This repository contains code for AdvEWM, as detailed in our paper published in JISA
☆18Mar 3, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
mcao516 / Autoregressive-VAE
View on GitHub
Transformer-based autoregressive varitional autoencoder
☆12Feb 10, 2020Updated 6 years ago
sail-sg / P-DoS
View on GitHub
[ArXiv 2025] Denial-of-Service Poisoning Attacks on Large Language Models
☆23Oct 22, 2024Updated last year
maozdemir / privateGPT-colab
View on GitHub
☆16Aug 8, 2023Updated 2 years ago
Huangxy-Minel / System-Design-for-Federated-Learning
View on GitHub
Paper list of federated learning: About system design
☆13Apr 13, 2022Updated 4 years ago
MBZUAI-CLeaR / IoE-Prompting
View on GitHub
☆11Feb 28, 2024Updated 2 years ago
ubuntu733 / DM
View on GitHub
CMU RavenClaw对话管理
☆12Dec 13, 2017Updated 8 years ago
akshayballal95 / private_gpt
View on GitHub
☆20Jun 4, 2023Updated 3 years ago