☆175Oct 31, 2025Updated 4 months ago
Alternatives and similar repositories for AgentSafety
Users that are interested in AgentSafety are comparing it to the libraries listed below
Sorting:
- ☆23Oct 25, 2024Updated last year
- Agent Security Bench (ASB)☆186Oct 27, 2025Updated 4 months ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆109Sep 27, 2024Updated last year
- ☆12May 6, 2022Updated 3 years ago
- ☆59Jun 5, 2024Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆27Jun 11, 2025Updated 8 months ago
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- ☆37Oct 2, 2024Updated last year
- Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models☆27Mar 15, 2025Updated 11 months ago
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆124Feb 19, 2025Updated last year
- A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Models☆308Jan 11, 2026Updated last month
- A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide…☆1,783Feb 1, 2026Updated last month
- [ICLR2025] Detecting Backdoor Samples in Contrastive Language Image Pretraining☆19Feb 26, 2025Updated last year
- Accepted by IJCAI-24 Survey Track☆231Aug 25, 2024Updated last year
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆61Sep 11, 2025Updated 5 months ago
- ☆28Oct 14, 2021Updated 4 years ago
- A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).☆1,870Feb 23, 2026Updated last week
- Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Models☆18Oct 16, 2024Updated last year
- [ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents☆32Jun 24, 2025Updated 8 months ago
- ☆75Jan 21, 2026Updated last month
- A fast + lightweight implementation of the GCG algorithm in PyTorch☆318May 13, 2025Updated 9 months ago
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆30Jul 11, 2023Updated 2 years ago
- ☆39May 17, 2025Updated 9 months ago
- Safety at Scale: A Comprehensive Survey of Large Model Safety☆228Feb 3, 2026Updated last month
- Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"☆34Aug 16, 2024Updated last year
- ☆22May 28, 2025Updated 9 months ago
- ☆109Feb 16, 2024Updated 2 years ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆86Nov 28, 2023Updated 2 years ago
- Code repository for the paper "Heuristic Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models"☆15Aug 7, 2025Updated 6 months ago
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆199Apr 12, 2025Updated 10 months ago
- Papers and resources related to the security and privacy of LLMs 🤖☆568Jun 8, 2025Updated 8 months ago
- This repository provides a benchmark for prompt injection attacks and defenses in LLMs☆396Oct 29, 2025Updated 4 months ago
- A curation of awesome tools, documents and projects about LLM Security.☆1,530Aug 20, 2025Updated 6 months ago
- [S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models☆19Feb 18, 2025Updated last year
- [NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆100Updated this week
- ☆105Aug 11, 2025Updated 6 months ago
- On the Robustness of GUI Grounding Models Against Image Attacks☆12Apr 8, 2025Updated 10 months ago
- True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning☆12Jul 6, 2022Updated 3 years ago
- ☆118Jul 2, 2024Updated last year