ShiJiawenwen/JudgeDeceiver

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ShiJiawenwen/JudgeDeceiver)

ShiJiawenwen / JudgeDeceiver

[CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge

☆41

Alternatives and similar repositories for JudgeDeceiver

Users that are interested in JudgeDeceiver are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rainavyas / attack-comparative-assessment
View on GitHub
Adversaial attack comparative assessment Large Language Model
☆13May 21, 2025Updated last year
scaleapi / mrt
View on GitHub
https://scale.com/research/mrt
☆20Mar 16, 2026Updated 4 months ago
uiuc-kang-lab / AdaptiveAttackAgent
View on GitHub
☆39Mar 12, 2025Updated last year
pasquini-dario / LLM_NeuralExec
View on GitHub
Code to generate NeuralExecs (prompt injection for LLMs)
☆27Oct 5, 2025Updated 9 months ago
sail-sg / Cheating-LLM-Benchmarks
View on GitHub
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆86Oct 23, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
quchangle1 / COLT
View on GitHub
The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.
☆26Nov 6, 2024Updated last year
ucsdwcsng / EdgeRIC-A-real-time-RIC
View on GitHub
☆24Sep 17, 2024Updated last year
AI-secure / UDora
View on GitHub
[ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents
☆38Jun 24, 2025Updated last year
MidiyaZhu / MePO
View on GitHub
Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization
☆13Jan 12, 2026Updated 6 months ago
AI4Bharat / FBI
View on GitHub
FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists
☆31Aug 14, 2025Updated 11 months ago
Lyz1213 / Backdoored_PPLM
View on GitHub
☆15Dec 12, 2023Updated 2 years ago
ZhangZhuoSJTU / LINT
View on GitHub
☆17Sep 4, 2024Updated last year
adnansirajrakin / TBT-CVPR2020
View on GitHub
In the repository we provide a sample code to implement the Targeted Bit Trojan attack.
☆20Nov 7, 2020Updated 5 years ago
HowieHwong / MetaTool
View on GitHub
[ICLR'24] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
☆115Mar 21, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
aisa-group / skill-inject
View on GitHub
Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
☆88Jul 1, 2026Updated 3 weeks ago
lan-lc / adversarial_example_of_Go
View on GitHub
Attack AlphaZero Go agents (NeurIPS 2022)
☆22Dec 3, 2022Updated 3 years ago
liu00222 / Open-Prompt-Injection
View on GitHub
This repository provides a benchmark for prompt injection attacks and defenses in LLMs
☆468Oct 29, 2025Updated 9 months ago
SaFo-Lab / ReasoningBomb
View on GitHub
[CCS 2026] The official implementation of our CCS 2026 paper "ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathological…
☆15Jun 24, 2026Updated last month
weiyezhimeng / SQL-Injection-Jailbreak
View on GitHub
☆22Jul 26, 2025Updated last year
lancopku / agent-backdoor-attacks
View on GitHub
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆116Sep 27, 2024Updated last year
kaijiezhu11 / MELON
View on GitHub
[ICML'25] MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents
☆37Jul 31, 2025Updated 11 months ago
ebagdasa / adversarial_illusions
View on GitHub
Code for "Adversarial Illusions in Multi-Modal Embeddings"
☆32Aug 4, 2024Updated last year
LLMSmith / LLMSmith
View on GitHub
☆50Feb 26, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
cua-framework / agents
View on GitHub
☆23Jan 30, 2026Updated 5 months ago
IBM / URET
View on GitHub
Universal Robustness Evaluation Toolkit (for Evasion)
☆32Sep 17, 2025Updated 10 months ago
CryptoAILab / MergeGuard
View on GitHub
[CCS-LAMPS'24] LLM IP Protection Against Model Merging
☆16Oct 14, 2024Updated last year
aaronmueller / MIB
View on GitHub
Landing page for MIB: A Mechanistic Interpretability Benchmark
☆26Aug 15, 2025Updated 11 months ago
Bowen1911 / xJailbreak
View on GitHub
Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"
☆17Apr 3, 2026Updated 3 months ago
aifinlab / Spider-Sense
View on GitHub
☆21Feb 6, 2026Updated 5 months ago
Nathangitlab / Backdoor-Attacks-on-Crowd-Counting
View on GitHub
this is for the ACM MM paper---Backdoor Attack on Crowd Counting
☆17Jul 10, 2022Updated 4 years ago
mengtong0110 / Tokenizer-MIA
View on GitHub
[USENIX Security 2026] Membership Inference Attacks on Tokenizers of Large Language Models
☆21May 22, 2026Updated 2 months ago
PurCL / ProSec
View on GitHub
Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"
☆18Feb 26, 2026Updated 5 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / SecAlign
View on GitHub
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆98Jul 2, 2026Updated 3 weeks ago
2019ChenGong / Offline_RL_Poisoner
View on GitHub
[S&P 2024] Replication Package for "Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets".
☆33Dec 30, 2024Updated last year
bigglesworthnotacat / LLM-Steg
View on GitHub
[ICLR 2026 Oral] Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
☆20Mar 22, 2026Updated 4 months ago
sejoonoh / ATR
View on GitHub
Code and data for the ACM CIKM 2024 paper "Adversarial Text Rewriting for Text-aware Recommender Systems"
☆12Aug 1, 2024Updated last year
wicai24 / DOOR-Alignment
View on GitHub
☆20Apr 7, 2025Updated last year
XuandongZhao / Ginsew
View on GitHub
[ICML 2023] Protecting Language Generation Models via Invisible Watermarking
☆13Sep 8, 2023Updated 2 years ago
huawei-lin / UniGuardian
View on GitHub
The implementation for paper "UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in …
☆17Jul 3, 2025Updated last year