Helloworld10011/Adversarial-Reasoning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Helloworld10011/Adversarial-Reasoning)

Helloworld10011 / Adversarial-Reasoning

A new algorithm that formulates jailbreaking as a reasoning problem.

☆26

Alternatives and similar repositories for Adversarial-Reasoning

Users that are interested in Adversarial-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

akumar2709 / OVERTHINK_public
View on GitHub
☆52Feb 25, 2026Updated 4 months ago
facebookresearch / jailbreak-objectives
View on GitHub
Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"
☆37Jul 2, 2026Updated last week
rishub-tamirisa / tamper-resistance
View on GitHub
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
☆68Jun 9, 2025Updated last year
salman-lui / x-teaming
View on GitHub
☆67May 21, 2025Updated last year
wonderNefelibata / Awesome-LRM-Safety
View on GitHub
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆84Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
tmllab / 2025_ICLR_PiF
View on GitHub
☆40May 17, 2025Updated last year
wagner-group / prompt-injection-defense
View on GitHub
Fine-tuning base models to build robust task-specific models
☆36Apr 11, 2024Updated 2 years ago
jaehyun513 / P2T
View on GitHub
Official implementation of Tabular Transfer Learning via Prompting LLMs (COLM 2024).
☆13Aug 6, 2024Updated last year
franciscoliu / SKU
View on GitHub
Official code implementation of SKU, Accepted by ACL 2024 Findings
☆20Dec 18, 2024Updated last year
shinington / facesec
View on GitHub
Corresponding code to "FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems" @ CVPR 2021
☆13Jun 22, 2021Updated 5 years ago
MadryLab / pretraining-distribution-shift-robustness
View on GitHub
☆14Mar 4, 2024Updated 2 years ago
TomSheng21 / AdaptGuard
View on GitHub
ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation
☆11Dec 23, 2023Updated 2 years ago
OSU-NLP-Group / EIA_against_webagent
View on GitHub
☆40Oct 2, 2024Updated last year
VITA-Group / SEAL
View on GitHub
[COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆60Apr 6, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
shinington / Robust-PDF-Classifier-with-Conserved-Features
View on GitHub
Corresponding code to "Improving Robustness of ML Classifiers against Realizable Evasion Attacks Using Conserved Features" @ USENIX Secur…
☆11Aug 5, 2019Updated 6 years ago
YuheD / awesome-performance-evaluation
View on GitHub
☆11Jul 3, 2024Updated 2 years ago
DavidFanzz / llm_decoding
View on GitHub
☆12Apr 25, 2025Updated last year
kztakemoto / simbaja
View on GitHub
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
☆17Apr 24, 2024Updated 2 years ago
RICommunity / TAP
View on GitHub
TAP: An automated jailbreaking method for black-box LLMs
☆239Dec 10, 2024Updated last year
tim-learn / Generate_list
View on GitHub
Generate custom text files for dataloader within UDA methods
☆14May 24, 2023Updated 3 years ago
AI-secure / Robustness-Against-Backdoor-Attacks
View on GitHub
RAB: Provable Robustness Against Backdoor Attacks
☆40Oct 3, 2023Updated 2 years ago
kkkevinkkkkk / situated_faithfulness
View on GitHub
☆14Oct 17, 2024Updated last year
MadryLab / bias-transfer
View on GitHub
☆15Jul 24, 2022Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
xirui-li / DrAttack
View on GitHub
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆69Aug 25, 2024Updated last year
tim-learn / NOUN
View on GitHub
code released for our TIP 2021 paper "Adversarial Domain Adaptation with Prototype-based Normalized Output Conditioner"
☆15May 24, 2023Updated 3 years ago
irenasaracay / model-equality-testing
View on GitHub
Test equality between a black-box LLM API and a reference distribution
☆18Oct 29, 2024Updated last year
ybwang119 / label_recovery
View on GitHub
[ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks
☆14Feb 6, 2024Updated 2 years ago
Sizhe-Chen / StruQ
View on GitHub
official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries
☆76Nov 10, 2025Updated 8 months ago
konpanousis / Adversarial-LWTA-AutoAttack
View on GitHub
☆12May 6, 2022Updated 4 years ago
wufeim / LychSim
View on GitHub
A controllable and interactive simulation framework for vision research.
☆16May 25, 2026Updated last month
yuleiqin / RAIF
View on GitHub
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆32Oct 9, 2025Updated 9 months ago
wxl-lxw / StringLLM
View on GitHub
[ICLR 2025] Official implementation for "StringLLM: Understanding the String Processing Capability of Large Language Models"
☆22Jan 23, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zknus / NeurIPS-2023-HANG-Robustness
View on GitHub
Adversarial Robustness in Graph Neural Networks: A Hamiltonian Energy Conservation Approach
☆16Apr 27, 2024Updated 2 years ago
ubc-tea / Local-Superior-Soups
View on GitHub
☆15Dec 10, 2024Updated last year
thefcraft / prompt-generator-stable-diffusion
View on GitHub
Prompt Generator model for Stable Diffusion Models
☆12Jun 20, 2023Updated 3 years ago
stanford-crfm / air-bench-2024
View on GitHub
AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies
☆30Aug 14, 2024Updated last year
xbq1994 / META
View on GitHub
ECCV 2022
☆16Aug 3, 2022Updated 3 years ago
inspire-group / tta_risk
View on GitHub
☆15Jun 6, 2023Updated 3 years ago
YancyKahn / CoA
View on GitHub
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆39Jan 17, 2025Updated last year