[ACL 2025] The official implementation of the paper "PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free".
☆60Dec 4, 2025Updated 3 months ago
Alternatives and similar repositories for PIGuard
Users that are interested in PIGuard are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agen…☆39Feb 14, 2026Updated 2 weeks ago
- Code for our NAACL2025 accepted paper: Attention Tracker: Detecting Prompt Injection Attacks in LLMs☆23Sep 19, 2025Updated 5 months ago
- [ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".☆33Aug 4, 2025Updated 7 months ago
- ☆21Jul 26, 2025Updated 7 months ago
- ☆25Sep 3, 2025Updated 6 months ago
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…☆88May 9, 2025Updated 9 months ago
- The code of "Image-text Retrieval via Preserving Main Semantic of Vision" in ICME 2023.☆15Dec 25, 2023Updated 2 years ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆106Apr 15, 2024Updated last year
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆16Mar 31, 2025Updated 11 months ago
- [ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents☆31Jun 24, 2025Updated 8 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Aug 22, 2024Updated last year
- Red Queen Dataset and data generation template☆26Dec 26, 2025Updated 2 months ago
- ☆18Mar 30, 2025Updated 11 months ago
- ☆23Jan 17, 2025Updated last year
- A Python library for guardrail models evaluation.☆33Oct 9, 2025Updated 4 months ago
- [ArXiv 2025] Denial-of-Service Poisoning Attacks on Large Language Models☆23Oct 22, 2024Updated last year
- Every practical and proposed defense against prompt injection.☆645Feb 22, 2025Updated last year
- [IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”☆87Aug 14, 2024Updated last year
- Prompt Injection Attacks against GPT-4, Gemini, Azure, Azure with Jailbreak☆29Oct 8, 2024Updated last year
- [NeurIPS 2023] The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" acce…☆27May 14, 2024Updated last year
- A benchmark for prompt injection detection systems.☆165Dec 16, 2025Updated 2 months ago
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆87Jul 24, 2025Updated 7 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆72May 22, 2025Updated 9 months ago
- [CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge☆39Sep 17, 2025Updated 5 months ago
- Test LLMs against jailbreaks and unprecedented harms☆40Oct 19, 2024Updated last year
- ☆12Jul 25, 2018Updated 7 years ago
- Automatically turns your RPI Pico into a bad usb. The pico-ducky is from dbisu.☆11May 28, 2024Updated last year
- ☆22Dec 30, 2025Updated 2 months ago
- ☆10Feb 27, 2026Updated last week
- Official implementation of the WASP web agent security benchmark☆71Aug 12, 2025Updated 6 months ago
- Free WaspBots Scripts☆16Feb 21, 2026Updated last week
- Marathon Unity☆13Oct 19, 2018Updated 7 years ago
- ☆29Dec 20, 2025Updated 2 months ago
- ☆44Feb 9, 2026Updated 3 weeks ago
- A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor☆30Jan 13, 2026Updated last month
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- A cool, simple and efficent Minecraft Server Starter for Windows☆11Sep 16, 2018Updated 7 years ago
- A static website for a Chatbot with Azure OpenAI, Azure Text to Speech Services and Live2D☆13Sep 4, 2024Updated last year
- yolo目标检测算法☆15Jul 27, 2025Updated 7 months ago