byerose / Awesome-Foundation-Model-SecurityLinks

A curated list of trustworthy Generative AI papers. Daily updating...

☆75

Alternatives and similar repositories for Awesome-Foundation-Model-Security

Users that are interested in Awesome-Foundation-Model-Security are comparing it to the libraries listed below

Sorting:

lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆102Updated last year
qingjiesjtu / USC
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
☆63Updated 11 months ago
papersPapers / BadPrompt
Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"
☆40Updated last year
cnut1648 / Model-Fingerprint
Fingerprint large language models
☆46Updated last year
phycholosogy / RAG-privacy
The code for paper "The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)", exploring the privacy risk o…
☆60Updated 10 months ago
xyq7 / GradSafe
Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"
☆60Updated last year
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆43Updated last year
Lyz1213 / BadEdit
☆36Updated last year
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆107Updated 10 months ago
thunlp / OpenBackdoor
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
☆196Updated 2 years ago
inspire-group / RobustRAG
☆21Updated last year
thu-ml / Attack-Bard
☆107Updated last year
Sizhe-Chen / StruQ
official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries
☆52Updated 3 weeks ago
Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models
Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
☆256Updated last year
ShiJiawenwen / JudgeDeceiver
[CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge
☆36Updated 2 months ago
safr-ai-lab / survey-llm
A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant code
☆68Updated last year
OSU-NLP-Group / AmpleGCG
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆77Updated last year
arobey1 / smooth-llm
☆114Updated 2 years ago
rotaryhammer / code-autodan
An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)
☆44Updated last year
lapisrocks / rpo
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆59Updated last year
AI45Lab / CodeAttack
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆55Updated 2 months ago
SproutNan / AI-Safety_SCAV
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆46Updated last month
HKUST-KnowComp / LLM-Multistep-Jailbreak
Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
☆35Updated 2 years ago
llm-editing / editing-attack
Code and dataset for the paper: "Can Editing LLMs Inject Harm?"
☆21Updated last year
AI45Lab / ActorAttack
☆112Updated 10 months ago
wegodev2 / virtual-prompt-injection
Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆26Updated last year
UCF-ML-Research / TrojLLM
☆26Updated last year
TrustAIResearch / MLHospital
☆44Updated 2 years ago
facebookresearch / advprompter
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
☆171Updated last year
CryptoAILab / MergeGuard
[CCS-LAMPS'24] LLM IP Protection Against Model Merging
☆16Updated last year