Astarojth / AgentAuditor-ASSEBenchLinks
☆18Updated last week
Alternatives and similar repositories for AgentAuditor-ASSEBench
Users that are interested in AgentAuditor-ASSEBench are comparing it to the libraries listed below
Sorting:
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆94Updated last year
 - [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆110Updated 8 months ago
 - The official repository for guided jailbreak benchmark☆24Updated 3 months ago
 - ☆23Updated 9 months ago
 - This repository contains the source code, datasets, and scripts for the paper "GenderCARE: A Comprehensive Framework for Assessing and Re…☆25Updated last year
 - Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.☆34Updated 11 months ago
 - The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster …☆55Updated last month
 - Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"☆77Updated 8 months ago
 - Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆32Updated last year
 - Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆65Updated 9 months ago
 - ☆42Updated 5 months ago
 - ☆109Updated 9 months ago
 - [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion☆54Updated last month
 - [CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge☆31Updated last month
 - ☆22Updated 4 months ago
 - TrustAgent: Towards Safe and Trustworthy LLM-based Agents☆53Updated 8 months ago
 - Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆152Updated 11 months ago
 - ☆69Updated 2 months ago
 - Agent Security Bench (ASB)☆141Updated last week
 - [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆162Updated 6 months ago
 - ☆42Updated 7 months ago
 - An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)☆105Updated 9 months ago
 - ☆66Updated 7 months ago
 - [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆99Updated last year
 - ☆111Updated last year
 - [NeurIPS'24] Protecting Your LLMs with Information Bottleneck☆21Updated 11 months ago
 - Official code implementation of SKU, Accepted by ACL 2024 Findings☆18Updated 10 months ago
 - ☆20Updated last year
 - Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆72Updated 3 months ago
 - ☆43Updated 2 years ago