Lucas-TY/llm_Implicit_reference

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Lucas-TY/llm_Implicit_reference)

Lucas-TY / llm_Implicit_reference

Official Implementation of implicit reference attack

☆11

Alternatives and similar repositories for llm_Implicit_reference

Users that are interested in llm_Implicit_reference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

multimodalpragmatic / multimodalpragmatic
View on GitHub
☆14Jan 14, 2026Updated 6 months ago
zhiyichin / P4D
View on GitHub
[ICML 2024] Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts (Official Pytorch Implementati…
☆51Jan 11, 2026Updated 6 months ago
ZhangZhuoSJTU / LINT
View on GitHub
☆17Sep 4, 2024Updated last year
ShoumikSaha / agent-skill-security
View on GitHub
☆15May 13, 2026Updated 2 months ago
mdbond / 3341-Public
View on GitHub
Shared code for CSE 3341
☆10Dec 13, 2022Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
PKU-ML / PAT
View on GitHub
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆22May 6, 2025Updated last year
Django-Jiang / BadChain
View on GitHub
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆56Jul 24, 2024Updated last year
UCSC-VLAA / AttnGCG-attack
View on GitHub
[TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
☆27Jun 17, 2025Updated last year
Bowen1911 / xJailbreak
View on GitHub
Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"
☆17Apr 3, 2026Updated 3 months ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
View on GitHub
☆61Jun 5, 2024Updated 2 years ago
SALT-NLP / search_privacy_risk
View on GitHub
Code for the paper "Searching Privacy Risks in Multi-Agent Systems via Simulation"
☆24Oct 13, 2025Updated 9 months ago
Yuchen413 / text2image_safety
View on GitHub
☆202Apr 7, 2025Updated last year
weizeming / momentum-attack-llm
View on GitHub
☆25Jan 17, 2025Updated last year
rrgeorge-pdcontributions / NSFW-Words-List
View on GitHub
Text file containing NSFW words aggregated from various sources.
☆12Aug 23, 2020Updated 5 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
LIONS-EPFL / Charmer
View on GitHub
Revisiting Character-level Adversarial Attacks for Language Models, ICML 2024
☆19Feb 12, 2025Updated last year
LukasStruppek / Exploiting-Cultural-Biases-via-Homoglyphs
View on GitHub
[Journal of Artificial Intelligence Research] Source code for our paper "Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synth…
☆12Jan 8, 2024Updated 2 years ago
MaTengSYSU / HIMRD-jailbreak
View on GitHub
Code repository for the paper "Heuristic Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models"
☆19Aug 7, 2025Updated 11 months ago
GuanlinLee / ART
View on GitHub
Official Code for ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users (NeurIPS 2024)
☆25Oct 23, 2024Updated last year
datar001 / Awesome-AD-on-T2IDM
View on GitHub
A collection of resources on attacks and defenses targeting text-to-image diffusion models
☆101Dec 20, 2025Updated 7 months ago
huizhang-L / CodeChameleon
View on GitHub
☆30Mar 20, 2024Updated 2 years ago
datar001 / Revealing-Vulnerabilities-in-Stable-Diffusion-via-Targeted-Attacks
View on GitHub
☆11Sep 10, 2024Updated last year
cruiseresearchgroup / COCOA
View on GitHub
COCOA: Cross Modality Contrastive Learning for Sensor Data
☆26Sep 11, 2022Updated 3 years ago
aranciokov / FSMMDA_VideoRetrieval
View on GitHub
☆10Nov 23, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
xlhex / extract_and_transfer
View on GitHub
☆10Jun 5, 2021Updated 5 years ago
KxPlaug / TAA-Bench
View on GitHub
☆13Feb 1, 2024Updated 2 years ago
jiah-li / magic
View on GitHub
The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.
☆15Dec 16, 2024Updated last year
AbrahamJobs / SoftwareHomework
View on GitHub
☆13May 1, 2016Updated 10 years ago
ChengshuaiZhao0 / The-Wolf-Within
View on GitHub
☆13Updated this week
compsec-snu / pfi
View on GitHub
PFI: Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents
☆30Mar 26, 2025Updated last year
chiayi-hsu / Ring-A-Bell
View on GitHub
☆45Jan 15, 2025Updated last year
yxoh / prompt_leak_usenix2024
View on GitHub
☆15May 5, 2026Updated 2 months ago
HuangZhiChao95 / ATAS
View on GitHub
☆12Oct 29, 2023Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
SilverRAN / Adversarial-Attack-Methods-Summary
View on GitHub
Awesome-Adversarial-Attack-Methods-Summary
☆13Jul 24, 2024Updated last year
neulab / ToM-Language-Acquisition
View on GitHub
Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".
☆15Apr 27, 2023Updated 3 years ago
verazuo / prompt-stealing-attack
View on GitHub
[USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models
☆53Jan 11, 2025Updated last year
YancyKahn / CoA
View on GitHub
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆39Jan 17, 2025Updated last year
DiLi-Lab / ScanDL
View on GitHub
☆14Apr 29, 2025Updated last year
ErxinYu / CoSafe-Dataset
View on GitHub
☆13Nov 12, 2024Updated last year
McGill-NLP / AdversarialTriggers
View on GitHub
TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models
☆19Aug 17, 2025Updated 11 months ago