qingjiesjtu / USCLinks

This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.

☆63

Alternatives and similar repositories for USC

Users that are interested in USC are comparing it to the libraries listed below

Sorting:

neelsjain / baseline-defenses
Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"
☆28Updated 2 years ago
byerose / Awesome-Foundation-Model-Security
A curated list of trustworthy Generative AI papers. Daily updating...
☆74Updated last year
DeepLearningSecurityGroup / Cyber_Security_Reading_Group
☆14Updated last month
csdongxian / ANP_backdoor
Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"
☆58Updated 2 years ago
lancopku / codable-watermarking-for-llm
Repository for Towards Codable Watermarking for Large Language Models
☆38Updated 2 years ago
wagner-group / MarkMyWords
☆31Updated last year
Eyr3 / TextCRS
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024)
☆34Updated 4 months ago
Lyz1213 / BadEdit
☆36Updated last year
thu-ml / Attack-Bard
☆105Updated last year
PKU-ML / PAT
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆19Updated 5 months ago
inspire-group / RobustRAG
☆21Updated last year
umd-huang-lab / VLM-Poisoning
Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"
☆55Updated 9 months ago
papersPapers / BadPrompt
Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"
☆40Updated last year
AI-secure / MMDT
Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
☆22Updated 7 months ago
SolidShen / RIPPLE_official
☆20Updated last year
AI45Lab / CodeAttack
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆54Updated 3 weeks ago
reds-lab / Meta-Sift
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on …
☆19Updated 2 years ago
ZhentingWang / DIAGNOSIS
☆22Updated last year
KuofengGao / Verbose_Images
[ICLR 2024] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
☆40Updated last year
zihao-ai / unthinking_vulnerability
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
☆32Updated 5 months ago
jinyuan-jia / BadEncoder
☆84Updated 4 years ago
rain152 / PAT
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
☆10Updated last year
SproutNan / AI-Safety_SCAV
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆45Updated 2 weeks ago
AntigoneRandy / SIREN
Official Implementation for "Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models" (IE…
☆23Updated 7 months ago
Gwinhen / BackdoorVault
A toolbox for backdoor attacks.
☆22Updated 2 years ago
SCLBD / DBD
☆31Updated 3 years ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
☆53Updated last year
cnut1648 / Model-Fingerprint
Fingerprint large language models
☆45Updated last year
yuplin2333 / representation-space-jailbreak
Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…
☆22Updated last year
LLMSecurity / MasterKey
MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks…
☆27Updated last year