facebookresearch/Meta_SecAlign

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/Meta_SecAlign)

facebookresearch / Meta_SecAlign

Repo for the paper "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks".

☆70

Alternatives and similar repositories for Meta_SecAlign

Users that are interested in Meta_SecAlign are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / SecAlign
View on GitHub
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆98Jul 2, 2026Updated 2 weeks ago
Greysahy / ipiguard
View on GitHub
[EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents
☆22Sep 16, 2025Updated 10 months ago
yizhu-joy / DataFilter
View on GitHub
☆15Nov 29, 2025Updated 7 months ago
sleeepeer / PIArena
View on GitHub
[ACL 2026] PIArena: A Platform for Prompt Injection Evaluation
☆39Apr 28, 2026Updated 2 months ago
sleeepeer / PISanitizer
View on GitHub
PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
☆18Dec 10, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SaFo-Lab / DRIFT
View on GitHub
[NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agen…
☆58Updated this week
uiuc-kang-lab / InjecAgent
View on GitHub
☆152Jul 2, 2024Updated 2 years ago
facebookresearch / rl-injector
View on GitHub
Official release of code for the paper RL is a hammer and LLMs are nails A simple RL approach to stronger prompt injection attacks
☆53May 6, 2026Updated 2 months ago
ethz-spylab / agentdojo
View on GitHub
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
☆670Jun 2, 2026Updated last month
facebookresearch / prompt-siren
View on GitHub
A research workbench for developing and testing attacks against large language models, with a focus on prompt injection vulnerabilities a…
☆54Updated this week
SaFo-Lab / AgentDyn
View on GitHub
The official implementation of the paper "AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?"
☆68May 19, 2026Updated 2 months ago
Sizhe-Chen / DefensiveToken
View on GitHub
Repo for the paper "Defending Against Prompt Injection With a Few DefensiveTokens"
☆19Nov 10, 2025Updated 8 months ago
facebookresearch / wasp
View on GitHub
Official implementation of the WASP web agent security benchmark
☆98Apr 13, 2026Updated 3 months ago
CHATS-lab / ToolShield
View on GitHub
[ICML 2026] Official implementation for paper "Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Ag…
☆28Jul 6, 2026Updated 2 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sunblaze-ucb / progent
View on GitHub
Progent: Securing AI Agents with Privilege Control
☆41May 14, 2026Updated 2 months ago
liu00222 / Open-Prompt-Injection
View on GitHub
This repository provides a benchmark for prompt injection attacks and defenses in LLMs
☆465Oct 29, 2025Updated 8 months ago
microsoft / TaskTracker
View on GitHub
TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…
☆92Sep 1, 2025Updated 10 months ago
albert-y1n / PISmith
View on GitHub
PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses
☆21Updated this week
RPC2 / AutoInject
View on GitHub
☆20Jun 12, 2026Updated last month
Unispac / Fight-Poison-With-Poison
View on GitHub
Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples
☆31Jul 11, 2023Updated 3 years ago
NISPLab / CleanSheet
View on GitHub
Code and full version of the paper "Hijacking Attacks against Neural Network by Analyzing Training Data"
☆14Feb 28, 2024Updated 2 years ago
MurrayTom / ToolSafe
View on GitHub
Official Implementation of "ToolSafe: Enhancing Tool Invocation Safety of LLM-based Agents via Proactive Step-level Guardrail and Feedbac…
☆70Mar 25, 2026Updated 3 months ago
SaFo-Lab / AGrail4Agent
View on GitHub
[ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".
☆42Aug 4, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
yjw1029 / EmbMarker
View on GitHub
Code and data for our paper "Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark"…
☆52Jul 11, 2023Updated 3 years ago
ys-zong / VLGuard
View on GitHub
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆90Jan 19, 2025Updated last year
facebookresearch / advprompter
View on GitHub
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
☆183May 6, 2024Updated 2 years ago
meet-cjli / CTRL
View on GitHub
An Embarrassingly Simple Backdoor Attack on Self-supervised Learning
☆22Jan 24, 2024Updated 2 years ago
uiuc-kang-lab / AdaptiveAttackAgent
View on GitHub
☆38Mar 12, 2025Updated last year
Kim-Minseon / APGP
View on GitHub
Automatic Jailbreaking of the Text-to-Image Generative AI Systems
☆15Jun 23, 2024Updated 2 years ago
JuniMay / llm.rs
View on GitHub
An attempt to migrate Karpathy's llm.c to safe rust.
☆13Jun 4, 2024Updated 2 years ago
Jinxiaolong1129 / Foot-in-the-door-Jailbreak
View on GitHub
☆23May 14, 2025Updated last year
DPamK / BadAgent
View on GitHub
☆33Feb 27, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
lancopku / agent-backdoor-attacks
View on GitHub
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆115Sep 27, 2024Updated last year
Unispac / shallow-vs-deep-alignment
View on GitHub
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆186Apr 23, 2025Updated last year
SaFo-Lab / ReasoningBomb
View on GitHub
[CCS 2026] The official implementation of our CCS 2026 paper "ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathological…
☆15Jun 24, 2026Updated 3 weeks ago
tedbackdoordefense / ted
View on GitHub
☆22Dec 14, 2023Updated 2 years ago
bigglesworthnotacat / LLM-Steg
View on GitHub
[ICLR 2026 Oral] Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
☆20Mar 22, 2026Updated 3 months ago
inspire-group / MIAdefenseSELENA
View on GitHub
[USENIX Security 2022] Mitigating Membership Inference Attacks by Self-Distillation Through a Novel Ensemble Architecture
☆16Aug 29, 2022Updated 3 years ago
Gwinhen / DRUPE
View on GitHub
Distribution Preserving Backdoor Attack in Self-supervised Learning
☆20Jan 27, 2024Updated 2 years ago