zhaoyiran924/Probe-Sampling

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhaoyiran924/Probe-Sampling)

zhaoyiran924 / Probe-Sampling

[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling

☆35

Alternatives and similar repositories for Probe-Sampling

Users that are interested in Probe-Sampling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
weizeming / momentum-attack-llm
View on GitHub
☆25Jan 17, 2025Updated last year
jiah-li / magic
View on GitHub
The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.
☆15Dec 16, 2024Updated last year
sail-sg / I-FSJ
View on GitHub
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
☆65Jan 11, 2025Updated last year
sail-sg / ActivePRM
View on GitHub
☆21Apr 16, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
UCSC-VLAA / AttnGCG-attack
View on GitHub
[TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
☆27Jun 17, 2025Updated last year
Algorithmic-Alignment-Lab / CommonClaim
View on GitHub
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
☆15Jun 21, 2023Updated 3 years ago
PKU-ML / PAT
View on GitHub
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆22May 6, 2025Updated last year
OSU-NLP-Group / AmpleGCG
View on GitHub
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆87Nov 3, 2024Updated last year
Huang-yihao / Personalization-based_backdoor
View on GitHub
☆12Dec 18, 2024Updated last year
GraySwanAI / nanoGCG
View on GitHub
A fast + lightweight implementation of the GCG algorithm in PyTorch
☆344May 13, 2025Updated last year
jiaxiaojunQAQ / I-GCG
View on GitHub
Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)
☆146Apr 7, 2025Updated last year
TrustAIRLab / VoiceJailbreakAttack
View on GitHub
Code for Voice Jailbreak Attacks Against GPT-4o.
☆38May 31, 2024Updated 2 years ago
Skytliang / SpyGame
View on GitHub
SpyGame: An interactive multi-agent framework to evaluate intelligence with large language models :D
☆15Nov 9, 2023Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
UNHSAILLab / working-memory-attack-on-llms
View on GitHub
Working Memory Attack on LLMs
☆18May 27, 2025Updated last year
dxlong2000 / FormatBiasEval
View on GitHub
Official codes for NAACL 2025 paper "LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias …
☆11Nov 25, 2025Updated 7 months ago
snu-mllab / Bayesian-Red-Teaming
View on GitHub
About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)
☆15Jul 9, 2023Updated 3 years ago
vis-nlp / OpenCQA
View on GitHub
☆13Jun 20, 2023Updated 3 years ago
verazuo / prompt-stealing-attack
View on GitHub
[USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models
☆53Jan 11, 2025Updated last year
thu-coai / TransferAttack
View on GitHub
[ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
☆19May 23, 2025Updated last year
AI45Lab / CodeAttack
View on GitHub
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆62Oct 1, 2025Updated 9 months ago
searchableai / ChainCQG
View on GitHub
☆13Feb 11, 2021Updated 5 years ago
ntunlp / LLMSanitize
View on GitHub
An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).
☆62Aug 13, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
facebookresearch / jailbreak-objectives
View on GitHub
Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"
☆37Jul 2, 2026Updated 3 weeks ago
dapurv5 / awesome-red-teaming-llms
View on GitHub
Papers from our SoK on Red-Teaming (Accepted at TMLR)
☆45Jun 17, 2026Updated last month
schauppi / Self-Rewarding-Language-Models
View on GitHub
☆50May 13, 2024Updated 2 years ago
boyiwei / alignment-attribution-code
View on GitHub
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆91Mar 30, 2025Updated last year
vermouthdky / SimTeG
View on GitHub
Official Repo of SimTeG
☆43Mar 29, 2024Updated 2 years ago
poloclub / robust-principles
View on GitHub
Robust Principles: Architectural Design Principles for Adversarially Robust CNNs
☆24Jan 13, 2024Updated 2 years ago
val-iisc / DAJAT
View on GitHub
Official Code for Efficient and Effective Augmentation Strategy for Adversarial Training (NeurIPS-2022)
☆17Mar 29, 2023Updated 3 years ago
LIONS-EPFL / Charmer
View on GitHub
Revisiting Character-level Adversarial Attacks for Language Models, ICML 2024
☆19Feb 12, 2025Updated last year
dt-3t / Transformer-en-to-cn
View on GitHub
使用Transformer进行中英翻译（demo）
☆17Aug 25, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
DAMO-NLP-SG / AdamergeX
View on GitHub
☆11Apr 2, 2024Updated 2 years ago
fra31 / rlhf-trojan-competition-submission
View on GitHub
☆19Feb 25, 2024Updated 2 years ago
Yu-Fangxu / COLD-Attack
View on GitHub
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆176Dec 18, 2024Updated last year
facebookresearch / advprompter
View on GitHub
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
☆183May 6, 2024Updated 2 years ago
yjw1029 / Self-Reminder-Data
View on GitHub
Data for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder"
☆20Oct 26, 2023Updated 2 years ago
xpf / Data-Efficient-Backdoor-Attacks
View on GitHub
Data-Efficient Backdoor Attacks
☆20Jun 15, 2022Updated 4 years ago
tmllab / 2025_ICLR_PiF
View on GitHub
☆40May 17, 2025Updated last year