thu-coai / Agent-SafetyBenchView external linksLinks
☆99Aug 11, 2025Updated 6 months ago
Alternatives and similar repositories for Agent-SafetyBench
Users that are interested in Agent-SafetyBench are comparing it to the libraries listed below
Sorting:
- Agent Security Bench (ASB)☆182Oct 27, 2025Updated 3 months ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆99Jan 11, 2026Updated last month
- ☆23Jan 17, 2025Updated last year
- [ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".☆32Aug 4, 2025Updated 6 months ago
- The official repository for guided jailbreak benchmark☆28Jul 28, 2025Updated 6 months ago
- [ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"☆22Updated this week
- Reasoning Activation in LLMs via Small Model Transfer (NeurIPS 2025)☆21Oct 16, 2025Updated 3 months ago
- ☆14Jun 7, 2024Updated last year
- Example agents for the Dreadnode platform☆22Dec 19, 2025Updated last month
- First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and saf…☆49Dec 3, 2025Updated 2 months ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆106May 20, 2025Updated 8 months ago
- MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols☆27Sep 24, 2025Updated 4 months ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆85Jun 20, 2025Updated 7 months ago
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…☆85May 9, 2025Updated 9 months ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆431Feb 3, 2026Updated last week
- ☆174Oct 31, 2025Updated 3 months ago
- Accepted by ECCV 2024☆186Oct 15, 2024Updated last year
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆53Jul 21, 2025Updated 6 months ago
- Efficient LLM query routing via multi-sampling. BEST-Route selects both model and number of responses based on query difficulty, cutting …☆42Aug 6, 2025Updated 6 months ago
- ☆23Oct 25, 2024Updated last year
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆109Sep 27, 2024Updated last year
- ☆23Oct 11, 2024Updated last year
- ☆28Aug 31, 2025Updated 5 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆68Oct 23, 2024Updated last year
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆61Sep 11, 2025Updated 5 months ago
- This tool allows local LLM usage that can automate tasks without human interventention. The agent can call itself recursively and work on…☆20May 5, 2025Updated 9 months ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆272Jul 28, 2025Updated 6 months ago
- [NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling☆33Nov 8, 2024Updated last year
- [ACL 2025] Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models☆42May 29, 2025Updated 8 months ago
- AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.☆230Aug 29, 2025Updated 5 months ago
- ☆35May 21, 2025Updated 8 months ago
- Auditing agents for fine-tuning safety☆18Oct 21, 2025Updated 3 months ago
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆32Jul 9, 2024Updated last year
- ☆70Oct 1, 2025Updated 4 months ago
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆77Jan 23, 2025Updated last year
- Security Threats related with MCP (Model Context Protocol), MCP Servers and more☆45Apr 24, 2025Updated 9 months ago
- Code to break Llama Guard☆32Dec 7, 2023Updated 2 years ago
- ☆37Oct 2, 2024Updated last year
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆225Sep 29, 2024Updated last year