max-andr/adversarial-random-search-gpt4

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/max-andr/adversarial-random-search-gpt4)

max-andr / adversarial-random-search-gpt4

Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]

☆43

Alternatives and similar repositories for adversarial-random-search-gpt4

Users that are interested in adversarial-random-search-gpt4 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tml-epfl / long-is-more-for-alignment
View on GitHub
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]
☆21May 2, 2024Updated 2 years ago
YanNeu / spurious_imagenet
View on GitHub
Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious Features in ImageNet
☆32Aug 22, 2023Updated 2 years ago
fra31 / rlhf-trojan-competition-submission
View on GitHub
☆19Feb 25, 2024Updated 2 years ago
nmndeep / revisiting-at
View on GitHub
[NeurIPS 2023] Code for the paper "Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threa…
☆39Dec 3, 2024Updated last year
j-cb / GOOD
View on GitHub
Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data
☆13Sep 20, 2022Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ethz-spylab / rlhf_trojan_competition
View on GitHub
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
☆119Jun 13, 2024Updated 2 years ago
valentyn1boreiko / SVCEs_code
View on GitHub
☆13Jun 23, 2022Updated 4 years ago
BrachioLab / adversarial_prompting
View on GitHub
☆53May 24, 2023Updated 3 years ago
jhayes14 / black-box-attacks
View on GitHub
Comparison of gradient estimation techniques for black-box adversarial examples
☆11Oct 31, 2018Updated 7 years ago
SchwinnL / circuit-breakers-eval
View on GitHub
Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting
☆18Apr 15, 2025Updated last year
tml-epfl / sgd-sparse-features
View on GitHub
SGD with large step sizes learns sparse features [ICML 2023]
☆34Apr 24, 2023Updated 3 years ago
fra31 / fab-attack
View on GitHub
Code for FAB-attack
☆34Jul 10, 2020Updated 6 years ago
eth-lre / LLM_ICL
View on GitHub
ACL24
☆11Jun 7, 2024Updated 2 years ago
tml-epfl / llm-adaptive-attacks
View on GitHub
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆391Jan 23, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
fra31 / robust-finetuning
View on GitHub
Code relative to "Adversarial robustness against multiple and single $l_p$-threat models via quick fine-tuning of robust classifiers"
☆19Nov 30, 2022Updated 3 years ago
JonasGeiping / carving
View on GitHub
Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives
☆71Feb 22, 2024Updated 2 years ago
inspire-group / proxy-distributions
View on GitHub
[ICLR 2022 official code] Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?
☆29Mar 15, 2022Updated 4 years ago
fra31 / sparse-rs
View on GitHub
Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks
☆45Feb 24, 2022Updated 4 years ago
parameterlab / leaky_thoughts
View on GitHub
Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" EMNLP 2025
☆17Jan 12, 2026Updated 6 months ago
CHATS-lab / persuasive_jailbreaker
View on GitHub
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
☆363Oct 17, 2025Updated 9 months ago
andyzoujm / breaking-llama-guard
View on GitHub
Code to break Llama Guard
☆32Dec 7, 2023Updated 2 years ago
sail-sg / Cheating-LLM-Benchmarks
View on GitHub
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆86Oct 23, 2024Updated last year
j-cb / Breaking_Down_OOD_Detection
View on GitHub
☆12Feb 19, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
qizhangli / MoreBayesian-attack
View on GitHub
Code for our ICLR 2023 paper Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples.
☆18May 31, 2023Updated 3 years ago
markusschanta / talks
View on GitHub
Slides and materials for various talks I've given
☆17Dec 21, 2022Updated 3 years ago
y0mingzhang / diffuse-distributions
View on GitHub
Forcing Diffuse Distributions out of Language Models
☆18Sep 10, 2024Updated last year
M4xim4l / DiG-IN
View on GitHub
DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences, Neuron Visualisations, and Visual Counterfactu…
☆10Oct 9, 2024Updated last year
Princeton-SysML / Jailbreak_LLM
View on GitHub
☆203Nov 26, 2023Updated 2 years ago
rainavyas / attack-comparative-assessment
View on GitHub
Adversaial attack comparative assessment Large Language Model
☆13May 21, 2025Updated last year
openai / safety-rbr-code-and-data
View on GitHub
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆209Jul 19, 2024Updated 2 years ago
parameterlab / trap
View on GitHub
Source code of "TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification", ACL2024 (findings)
☆15Nov 20, 2024Updated last year
ChenWu98 / agent-attack
View on GitHub
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆140Feb 19, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Framartin / lgv-geometric-transferability
View on GitHub
Source of the ECCV22 paper "LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity"
☆18Mar 12, 2025Updated last year
aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
LLM-QC / judgezoo
View on GitHub
A collection of judges for evaluating LLM model output for safety & toxicity with a standardized API.
☆15Jan 7, 2026Updated 6 months ago
tml-epfl / why-weight-decay
View on GitHub
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆73Sep 25, 2024Updated last year
brendel-group / objects-compositional-generalization
View on GitHub
Official code for the paper "Provable Compositional Generalization for Object-Centric Learning" (ICLR 2024, oral)
☆16Aug 26, 2024Updated last year
parsonsmatt / beginner-error-messages
View on GitHub
Informative error messages for common beginner misunderstandings with Haskell
☆15Aug 29, 2019Updated 6 years ago
zhxieml / remiss-jailbreak
View on GitHub
☆33Jun 24, 2024Updated 2 years ago