☆22Oct 25, 2024Updated last year
Alternatives and similar repositories for AgentAttack
Users that are interested in AgentAttack are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆186Oct 31, 2025Updated 6 months ago
- Internal Consistency Regularization (CROW) for LLM Backdoor Elimination - Paper accepted to ICML 2025☆16May 6, 2025Updated last year
- ☆40Oct 2, 2024Updated last year
- ☆13Nov 17, 2024Updated last year
- Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]☆21Dec 26, 2025Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆137Jul 2, 2024Updated last year
- Repository for the Paper: Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Inj…☆19Apr 17, 2026Updated last month
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆137Feb 19, 2025Updated last year
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models☆268Jan 27, 2026Updated 3 months ago
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆31Jun 1, 2024Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆28Jun 11, 2025Updated 11 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆97May 23, 2024Updated last year
- ☆13May 18, 2024Updated 2 years ago
- ☆95Mar 13, 2026Updated 2 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆87Nov 3, 2024Updated last year
- Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.☆57Nov 13, 2023Updated 2 years ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆88Jan 19, 2025Updated last year
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆130Apr 15, 2024Updated 2 years ago
- ☆22May 23, 2025Updated 11 months ago
- Pytorch implementation of NPAttack☆12Jul 7, 2020Updated 5 years ago
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"☆62Aug 8, 2024Updated last year
- True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning☆12Jul 6, 2022Updated 3 years ago
- Repository for USTC 2021 Spring Database Labs☆10Jul 6, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- TextCenGen introduces a dynamic adaptation of the blank region for text-friendly image generation, and enhances T2I model outcomes on arb…☆38Jul 7, 2025Updated 10 months ago
- Agent Security Bench (ASB)☆248Apr 16, 2026Updated last month
- Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression☆14Mar 22, 2025Updated last year
- Automated Question-Answering Over Knowledge Graphs in O&M of Wind Turbines☆13Aug 16, 2022Updated 3 years ago
- ☆18Jan 3, 2025Updated last year
- ☆11Oct 18, 2022Updated 3 years ago
- Computer Organization and Design labs @USTC, 2021 Spring☆11Jun 28, 2021Updated 4 years ago
- ☆73Feb 16, 2025Updated last year
- An Autonomous Curriculum Reinforcement Learning framework that steers agents to continually learn in specific environments with zero huma…☆32May 13, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- clock_plot provides a simple way to visualize timeseries data, mapping 24 hours onto the 360 degrees of a polar plot☆15Apr 5, 2022Updated 4 years ago
- Symmetric Encryption with Language Models☆13Jun 13, 2023Updated 2 years ago
- [ICLR 2023, Spotlight] Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning☆31Dec 2, 2023Updated 2 years ago
- This repository contains data and code used for On the Risk of Misinformation Pollution with Large Language Models (EMNLP 2023 Findings).☆17Dec 14, 2023Updated 2 years ago
- ☆12Apr 27, 2022Updated 4 years ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆203Mar 22, 2024Updated 2 years ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆152Jul 19, 2024Updated last year