MurrayTom/ToolSafe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MurrayTom/ToolSafe)

MurrayTom / ToolSafe

Official Implementation of "ToolSafe: Enhancing Tool Invocation Safety of LLM-based Agents via Proactive Step-level Guardrail and Feedback"

☆70

Alternatives and similar repositories for ToolSafe

Users that are interested in ToolSafe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Greysahy / ipiguard
View on GitHub
[EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents
☆22Sep 16, 2025Updated 10 months ago
CHATS-lab / ToolShield
View on GitHub
[ICML 2026] Official implementation for paper "Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Ag…
☆28Jul 6, 2026Updated 2 weeks ago
TanqiuJiang / AgentLAB
View on GitHub
The official implementation of the paper "AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks"
☆26Jun 1, 2026Updated last month
Yunhao-Feng / AgentHazard
View on GitHub
☆28Jun 13, 2026Updated last month
SaFo-Lab / AGrail4Agent
View on GitHub
[ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".
☆42Aug 4, 2025Updated 11 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
aisa-group / skill-inject
View on GitHub
Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
☆88Jul 1, 2026Updated 2 weeks ago
AI45Lab / skill-safety-bench
View on GitHub
☆29May 14, 2026Updated 2 months ago
Astarojth / AgentAuditor-ASSEBench
View on GitHub
☆39May 29, 2026Updated last month
AI45Lab / DeepSafe
View on GitHub
All-in-One Safety Evaluation Framwork
☆51Updated this week
Claw-Guard / ClawGuard
View on GitHub
☆24May 12, 2026Updated 2 months ago
ethz-spylab / agentdojo
View on GitHub
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
☆670Jun 2, 2026Updated last month
agiresearch / ASB
View on GitHub
Agent Security Bench (ASB)
☆271Apr 16, 2026Updated 3 months ago
HanjiangHu / NBF-LLM
View on GitHub
The official code for "Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks".
☆18Jun 24, 2026Updated 3 weeks ago
SaFo-Lab / AgentDyn
View on GitHub
The official implementation of the paper "AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?"
☆68May 19, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
TangciuYueng / AMemGuard
View on GitHub
☆11Jul 2, 2026Updated 2 weeks ago
ZZR0 / CodeAttack
View on GitHub
Adversarial Attack for Pre-trained Code Models
☆10Jul 19, 2022Updated 4 years ago
xjzzzzzzzz / MCPSafety
View on GitHub
☆22Dec 18, 2025Updated 7 months ago
S2yyyy / OpenClaw-Analysis
View on GitHub
☆31Mar 11, 2026Updated 4 months ago
WisdomShell / ADG
View on GitHub
[ACL'26 Main Conference] Instruction Data Selection via Answer Divergence
☆22Apr 14, 2026Updated 3 months ago
wangbo9719 / MEXTRA
View on GitHub
Source code for the ACL'2025 paper titled "Unveiling privacy risks in llm agent memory"
☆34Dec 2, 2025Updated 7 months ago
ModelTC / Prototype
View on GitHub
☆14Feb 3, 2026Updated 5 months ago
salman-lui / x-teaming
View on GitHub
☆68May 21, 2025Updated last year
RUC-NLPIR / ET-Agent
View on GitHub
☆20Jan 18, 2026Updated 6 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Xuzhenhua55 / awesome-llm-copyright-protection
View on GitHub
A curated collection of research and techniques for protecting intellectual property of large language models, including watermarking, fi…
☆52Jun 10, 2026Updated last month
kaijiezhu11 / MELON
View on GitHub
[ICML'25] MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents
☆36Jul 31, 2025Updated 11 months ago
yuki-younai / MTSA
View on GitHub
offical implementation of MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
☆16Jun 2, 2025Updated last year
facebookresearch / Meta_SecAlign
View on GitHub
Repo for the paper "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks".
☆70Jun 11, 2026Updated last month
ZexuSun / AgentSkiller
View on GitHub
☆30Feb 11, 2026Updated 5 months ago
jianshuod / SafeSearch
View on GitHub
[ICML 2026] Official implementations of ``SafeSearch: Automated Red-Teaming of LLM-Based Search Agents''
☆19Mar 25, 2026Updated 3 months ago
Leey21 / data-lineage
View on GitHub
Trace origins, shared sources, and contamination risk
☆25May 27, 2026Updated last month
sheep333c / DIVE
View on GitHub
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
☆26Mar 13, 2026Updated 4 months ago
Zhow01 / SkillAttack
View on GitHub
☆52May 19, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
antgroup / Agent3Sigma-Canary
View on GitHub
Agent3σ-Canary is an evaluation framework for AI Agent security in realistic runtime environments.
☆32Jun 24, 2026Updated 3 weeks ago
QingyuLiu / Agentic-Upward-Deception
View on GitHub
This repo is the official implementation of “Are Your Agents Upward Deceivers?”. The paper is accepted by ICML 2026.
☆24Dec 15, 2025Updated 7 months ago
SaFo-Lab / DRIFT
View on GitHub
[NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agen…
☆58Updated this week
pasquini-dario / LLM_NeuralExec
View on GitHub
Code to generate NeuralExecs (prompt injection for LLMs)
☆27Oct 5, 2025Updated 9 months ago
PiedPiper0709 / openclaw-malicious-skills
View on GitHub
本仓库整理公开披露的 OpenClaw 恶意 / 可疑 Skills 样本，并对其进行分类、分级与风险说明，供安全研究、平台治理和风险检测使用。
☆20Mar 16, 2026Updated 4 months ago
TrustAIRLab / HarmfulSkillBench
View on GitHub
The Official Repository for Paper "HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?"
☆15May 2, 2026Updated 2 months ago
MurrayTom / SG-Bench
View on GitHub
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
☆26Nov 29, 2024Updated last year