[ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".
☆33Aug 4, 2025Updated 7 months ago
Alternatives and similar repositories for AGrail4Agent
Users that are interested in AGrail4Agent are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agen…☆39Feb 14, 2026Updated 2 weeks ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆69Oct 23, 2024Updated last year
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…☆88May 9, 2025Updated 9 months ago
- Code for ICCV2025 paper——IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves☆17Jul 11, 2025Updated 7 months ago
- [ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"☆23Feb 7, 2026Updated 3 weeks ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…☆71Feb 9, 2026Updated 3 weeks ago
- ☆28Feb 27, 2025Updated last year
- [ACL 2025] The official implementation of the paper "PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free".☆60Dec 4, 2025Updated 3 months ago
- ☆21Jul 26, 2025Updated 7 months ago
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆124Feb 19, 2025Updated last year
- A package that achieves 95%+ transfer attack success rate against GPT-4☆26Oct 24, 2024Updated last year
- [CVPR 2023] The official implementation of our CVPR 2023 paper "Detecting Backdoors During the Inference Stage Based on Corruption Robust…☆24May 25, 2023Updated 2 years ago
- ☆105Aug 11, 2025Updated 6 months ago
- The code for WWW2024 paper "Rethinking Cross-Domain Sequential Recommendation under Open-World Assumptions".☆35Aug 12, 2024Updated last year
- Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)☆10Feb 2, 2024Updated 2 years ago
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆87Jul 24, 2025Updated 7 months ago
- [CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge☆39Sep 17, 2025Updated 5 months ago
- ☆37Oct 2, 2024Updated last year
- This is the code of ICLR 2022 Oral paper 'Non-Transferable Learning: A New Approach for Model Ownership Verification and Applicability Au…☆30Oct 22, 2023Updated 2 years ago
- Test LLMs against jailbreaks and unprecedented harms☆40Oct 19, 2024Updated last year
- [NeurIPS-2023] Annual Conference on Neural Information Processing Systems☆228Dec 22, 2024Updated last year
- Overcooked! 2 TAS Development Framework☆10Aug 18, 2023Updated 2 years ago
- ☆43Feb 9, 2026Updated 3 weeks ago
- Official implementation of the WASP web agent security benchmark☆71Aug 12, 2025Updated 6 months ago
- A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor☆30Jan 13, 2026Updated last month
- 2020湖南省第一届人工智能大赛参赛作品☆11Feb 17, 2022Updated 4 years ago
- [EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents☆16Sep 16, 2025Updated 5 months ago
- yolo目标检测算法☆15Jul 27, 2025Updated 7 months ago
- ☆16Jan 16, 2025Updated last year
- Precision Knowledge Editing (PKE): A novel method to reduce toxicity in LLMs while preserving performance, with robust evaluations and ha…☆11Nov 26, 2024Updated last year
- An implementation of MSSRM method☆11Mar 23, 2023Updated 2 years ago
- The Pair App is employed by the Agency of Learning for team management and communication.☆10Apr 13, 2024Updated last year
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆109Sep 27, 2024Updated last year
- ☆15Feb 11, 2025Updated last year
- Exploring advanced prompting tools to query SQL database with multiple tables in natural language using LLMs☆16Aug 23, 2024Updated last year
- ☆11Sep 8, 2023Updated 2 years ago
- official implementation of Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation☆13Apr 15, 2024Updated last year
- Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization☆12Jan 12, 2026Updated last month
- [COLM 2024] LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models☆14Jan 4, 2025Updated last year