SaFo-Lab/AGrail4Agent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SaFo-Lab/AGrail4Agent)

SaFo-Lab / AGrail4Agent

[ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".

☆42

Alternatives and similar repositories for AGrail4Agent

Users that are interested in AGrail4Agent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

guardagent / code
View on GitHub
☆47Dec 9, 2025Updated 7 months ago
SaFo-Lab / DoxBench
View on GitHub
[ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"
☆30Feb 7, 2026Updated 5 months ago
SaFo-Lab / JailBreakV_28K
View on GitHub
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆96May 9, 2025Updated last year
SaFo-Lab / ReasoningBomb
View on GitHub
[CCS 2026] The official implementation of our CCS 2026 paper "ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathological…
☆15Jun 24, 2026Updated last month
wangbo9719 / MEXTRA
View on GitHub
Source code for the ACL'2025 paper titled "Unveiling privacy risks in llm agent memory"
☆34Dec 2, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Astarojth / AgentAuditor-ASSEBench
View on GitHub
☆39May 29, 2026Updated last month
Zhang-Henry / INACTIVE
View on GitHub
The official implementation of CVPR 2025 paper "Invisible Backdoor Attack against Self-supervised Learning"
☆19Jul 5, 2025Updated last year
aifinlab / Spider-Sense
View on GitHub
☆21Feb 6, 2026Updated 5 months ago
MurrayTom / ToolSafe
View on GitHub
Official Implementation of "ToolSafe: Enhancing Tool Invocation Safety of LLM-based Agents via Proactive Step-level Guardrail and Feedbac…
☆74Mar 25, 2026Updated 3 months ago
SaFo-Lab / DynAuditClaw
View on GitHub
DynAuditClaw — A security audit skill that dynamically discovers your OpenClaw agent's real configuration, designs targeted attack scenar…
☆15Apr 6, 2026Updated 3 months ago
m4p1e / agent-sentinel
View on GitHub
AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents
☆35Aug 31, 2025Updated 10 months ago
SheltonLiu-N / Universal-Prompt-Injection
View on GitHub
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆73Oct 23, 2024Updated last year
SaFo-Lab / AgentDyn
View on GitHub
The official implementation of the paper "AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?"
☆68May 19, 2026Updated 2 months ago
MingyuJ666 / Disentangling-Memory-and-Reasoning
View on GitHub
[ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.
☆87Nov 2, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kaijiezhu11 / MELON
View on GitHub
[ICML'25] MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents
☆37Jul 31, 2025Updated 11 months ago
sleeepeer / PIArena
View on GitHub
[ACL 2026] PIArena: A Platform for Prompt Injection Evaluation
☆41Apr 28, 2026Updated 2 months ago
facebookresearch / SecAlign
View on GitHub
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆98Jul 2, 2026Updated 3 weeks ago
SaFo-Lab / seclaw
View on GitHub
🦾 SeClaw: The Security Armored Personal AI Assistant
☆31Mar 18, 2026Updated 4 months ago
Greysahy / ipiguard
View on GitHub
[EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents
☆22Sep 16, 2025Updated 10 months ago
M0gician / RaccoonBench
View on GitHub
[ACL 2024] Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications
☆18Apr 9, 2026Updated 3 months ago
thu-coai / Agent-SafetyBench
View on GitHub
☆149Aug 11, 2025Updated 11 months ago
albert-y1n / PISmith
View on GitHub
PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses
☆22Jul 17, 2026Updated last week
sunblaze-ucb / progent
View on GitHub
Progent: Securing AI Agents with Privilege Control
☆41May 14, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
SaFo-Lab / AdaShield
View on GitHub
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…
☆73Feb 9, 2026Updated 5 months ago
TanqiuJiang / AgentLAB
View on GitHub
The official implementation of the paper "AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks"
☆27Jun 1, 2026Updated last month
TangciuYueng / AMemGuard
View on GitHub
☆11Jul 2, 2026Updated 3 weeks ago
CHATS-lab / ToolShield
View on GitHub
[ICML 2026] Official implementation for paper "Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Ag…
☆29Jul 6, 2026Updated 2 weeks ago
Sizhe-Chen / StruQ
View on GitHub
official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries
☆77Nov 10, 2025Updated 8 months ago
ozyyshr / RAST
View on GitHub
Reasoning Activation in LLMs via Small Model Transfer (NeurIPS 2025)
☆22Oct 16, 2025Updated 9 months ago
khuangaf / ZeroFEC
View on GitHub
Official implementation of the ACL 2023 paper: "Zero-shot Faithful Factual Error Correction"
☆17Aug 14, 2023Updated 2 years ago
facebookresearch / Meta_SecAlign
View on GitHub
Repo for the paper "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks".
☆70Jun 11, 2026Updated last month
OSU-NLP-Group / AgentSafety
View on GitHub
☆192Oct 31, 2025Updated 8 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
uiuc-kang-lab / InjecAgent
View on GitHub
☆153Jul 2, 2024Updated 2 years ago
roywang021 / IDEATOR
View on GitHub
Code for ICCV2025 paper——IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
☆18Jul 11, 2025Updated last year
SaFo-Lab / MetaAgent
View on GitHub
Offical Repository of MetaAgent Program
☆53Dec 2, 2025Updated 7 months ago
agiresearch / ASB
View on GitHub
Agent Security Bench (ASB)
☆273Apr 16, 2026Updated 3 months ago
yunqing-me / AttackVLM
View on GitHub
[NeurIPS-2023] Annual Conference on Neural Information Processing Systems
☆231Dec 22, 2024Updated last year
WujiangXu / AMID
View on GitHub
The code for WWW2024 paper "Rethinking Cross-Domain Sequential Recommendation under Open-World Assumptions".
☆37Aug 12, 2024Updated last year
uiuc-kang-lab / AdaptiveAttackAgent
View on GitHub
☆39Mar 12, 2025Updated last year