OSU-NLP-Group/AgentAttack

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OSU-NLP-Group/AgentAttack)

OSU-NLP-Group / AgentAttack

☆22

Alternatives and similar repositories for AgentAttack

Users that are interested in AgentAttack are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OSU-NLP-Group / EIA_against_webagent
View on GitHub
☆40Oct 2, 2024Updated last year
OSU-NLP-Group / AgentSafety
View on GitHub
☆192Oct 31, 2025Updated 8 months ago
Dakingrai / ood-generalization-semantic-boundary-techniques
View on GitHub
☆13Nov 17, 2024Updated last year
baixianghuang / editing-attack
View on GitHub
Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]
☆21Dec 26, 2025Updated 6 months ago
MurongYue / LLM_MoT_cascade
View on GitHub
This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…
☆32Jun 1, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Sadcardation / ImageProtector
View on GitHub
Repository for the Paper: Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Inj…
☆19Apr 17, 2026Updated 3 months ago
real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
wbopan / safety-residual-space
View on GitHub
Multi-dimensional analysis of orthogonal safety directions in LLM alignment
☆22Jun 12, 2026Updated last month
SafeAILab / RAIN
View on GitHub
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆99May 23, 2024Updated 2 years ago
chanchimin / AgentMonitor
View on GitHub
Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"
☆13Dec 13, 2024Updated last year
nvedant07 / Fairness-Through-Robustness
View on GitHub
Official code for FAccT'21 paper "Fairness Through Robustness: Investigating Robustness Disparity in Deep Learning" https://arxiv.org/abs…
☆13Mar 9, 2021Updated 5 years ago
OSU-NLP-Group / AmpleGCG
View on GitHub
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆87Nov 3, 2024Updated last year
yjw1029 / Self-Reminder
View on GitHub
Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.
☆57Nov 13, 2023Updated 2 years ago
panorama-research / mobstr-dataset
View on GitHub
The MobSTr dataset provides artifacts that demonstrate Model-based Safety Assurance and Traceability for a safety-critical automotive sys…
☆10Mar 18, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Yunhao-Feng / AgentHazard
View on GitHub
☆28Jun 13, 2026Updated last month
dki-lab / few-shot-bioIE
View on GitHub
True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning
☆12Jul 6, 2022Updated 4 years ago
AI-secure / Knowledge-Enhanced-Machine-Learning-Pipeline
View on GitHub
Repository for Knowledge Enhanced Machine Learning Pipeline (KEMLP)
☆10Jun 5, 2021Updated 5 years ago
OSU-NLP-Group / SeeActChromeExtension
View on GitHub
☆18Jan 3, 2025Updated last year
liuzrcc / ImageShortcutSqueezing
View on GitHub
Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression
☆14Mar 22, 2025Updated last year
jylee425 / mobilesafetybench
View on GitHub
Evaluating Safety of Autonomous Agents in Mobile Device Control (AAAI 2026 AI Alignment Track)
☆34Jan 28, 2026Updated 5 months ago
sleeepeer / PoisonedRAG
View on GitHub
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆285Jan 27, 2026Updated 5 months ago
joyjitchatterjee / WindTurbine-QAKG
View on GitHub
Automated Question-Answering Over Knowledge Graphs in O&M of Wind Turbines
☆14Aug 16, 2022Updated 3 years ago
SALT-NLP / PopupAttack
View on GitHub
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
☆51Dec 23, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ChenWu98 / agent-attack
View on GitHub
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆139Feb 19, 2025Updated last year
sokcertifiedrobustness / VeriGauge-deprecated
View on GitHub
☆11Oct 18, 2022Updated 3 years ago
amayuelas / multi-agent-attack
View on GitHub
MutliAgent Attack
☆15Oct 3, 2024Updated last year
rain152 / PAT
View on GitHub
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
☆11Oct 29, 2024Updated last year
OSU-NLP-Group / SELM
View on GitHub
Symmetric Encryption with Language Models
☆13Jun 13, 2023Updated 3 years ago
kaiwenzha / contrastive-poisoning
View on GitHub
[ICLR 2023, Spotlight] Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning
☆32Dec 2, 2023Updated 2 years ago
microsoft / text-to-sql-schema-expansion-generalization
View on GitHub
Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion
☆13Jul 26, 2023Updated 2 years ago
uw-nsl / SafeDecoding
View on GitHub
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆154Jul 19, 2024Updated 2 years ago
wang2226 / Trojan-Activation-Attack
View on GitHub
[CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.
☆30Jul 29, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hannxu123 / fair_robust
View on GitHub
☆12Apr 27, 2022Updated 4 years ago
kleeeeea / ECON
View on GitHub
☆15Apr 6, 2020Updated 6 years ago
facebookresearch / SecAlign
View on GitHub
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆98Jul 2, 2026Updated 2 weeks ago
jonathanherzig / zero-shot-semantic-parsing
View on GitHub
Author implementation of the paper "Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing"
☆18Nov 2, 2018Updated 7 years ago
sunlab-osu / IterPrompt
View on GitHub
☆19Nov 7, 2022Updated 3 years ago
neelsjain / baseline-defenses
View on GitHub
Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"
☆34Oct 26, 2023Updated 2 years ago
itsvaibhav01 / Immune
View on GitHub
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆28Jun 11, 2025Updated last year