☆23Oct 25, 2024Updated last year
Alternatives and similar repositories for AgentAttack
Users that are interested in AgentAttack are comparing it to the libraries listed below
Sorting:
- ☆181Oct 31, 2025Updated 4 months ago
- ☆29Feb 27, 2025Updated last year
- ☆126Jul 2, 2024Updated last year
- ☆30Mar 13, 2026Updated last week
- Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Models☆18Oct 16, 2024Updated last year
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models☆245Jan 27, 2026Updated last month
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆31Jun 1, 2024Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆27Jun 11, 2025Updated 9 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆98May 23, 2024Updated last year
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- Official code for FAccT'21 paper "Fairness Through Robustness: Investigating Robustness Disparity in Deep Learning" https://arxiv.org/abs…☆13Mar 9, 2021Updated 5 years ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆112Apr 15, 2024Updated last year
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆85Nov 3, 2024Updated last year
- Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.☆57Nov 13, 2023Updated 2 years ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆86Jan 19, 2025Updated last year
- Agent Security Bench (ASB)☆201Oct 27, 2025Updated 4 months ago
- An Autonomous Curriculum Reinforcement Learning framework that steers agents to continually learn in specific environments with zero huma…☆25Feb 25, 2026Updated 3 weeks ago
- Kubernetes cli (kubectl) powered by GPT☆15Apr 20, 2023Updated 2 years ago
- True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning☆12Jul 6, 2022Updated 3 years ago
- Automated Question-Answering Over Knowledge Graphs in O&M of Wind Turbines☆12Aug 16, 2022Updated 3 years ago
- This repository contains data and code used for On the Risk of Misinformation Pollution with Large Language Models (EMNLP 2023 Findings).☆16Dec 14, 2023Updated 2 years ago
- Make LLM can control your PC or Server with ssh or terminal.☆25Sep 17, 2025Updated 6 months ago
- ☆72Feb 16, 2025Updated last year
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- Symmetric Encryption with Language Models☆13Jun 13, 2023Updated 2 years ago
- Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning☆22Jul 8, 2024Updated last year
- The MobSTr dataset provides artifacts that demonstrate Model-based Safety Assurance and Traceability for a safety-critical automotive sys…☆10Mar 18, 2022Updated 4 years ago
- Implementation for <Understanding Robust Overftting of Adversarial Training and Beyond> in ICML'22.☆13Jul 1, 2022Updated 3 years ago
- [ICLR 2023, Spotlight] Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning☆31Dec 2, 2023Updated 2 years ago
- Code for Voice Jailbreak Attacks Against GPT-4o.☆37May 31, 2024Updated last year
- ☆12Apr 27, 2022Updated 3 years ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆192Mar 22, 2024Updated 2 years ago
- ☆15Apr 6, 2020Updated 5 years ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆151Jul 19, 2024Updated last year
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆206Apr 12, 2025Updated 11 months ago
- 方兆本著随机过程第三版 习题答案☆19Oct 31, 2022Updated 3 years ago
- Author implementation of the paper "Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing"☆18Nov 2, 2018Updated 7 years ago
- ☆19May 23, 2025Updated 9 months ago