A new algorithm that formulates jailbreaking as a reasoning problem.
☆26Jul 2, 2025Updated 9 months ago
Alternatives and similar repositories for Adversarial-Reasoning
Users that are interested in Adversarial-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"☆36Apr 8, 2026Updated last week
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆66Jun 9, 2025Updated 10 months ago
- Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models☆32Oct 6, 2025Updated 6 months ago
- ☆62May 21, 2025Updated 10 months ago
- [ICCVW 2025 (Oral)] Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models☆29Oct 20, 2025Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆82Updated this week
- This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.☆16Jun 28, 2024Updated last year
- Fine-tuning base models to build robust task-specific models☆35Apr 11, 2024Updated 2 years ago
- Official code implementation of SKU, Accepted by ACL 2024 Findings☆20Dec 18, 2024Updated last year
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- ☆39Oct 2, 2024Updated last year
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆56Apr 6, 2025Updated last year
- ☆10Jul 3, 2024Updated last year
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆31Oct 2, 2025Updated 6 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- TAP: An automated jailbreaking method for black-box LLMs☆227Dec 10, 2024Updated last year
- ☆12Apr 25, 2025Updated 11 months ago
- Copy, paste and move files like you do in Finder in Dired.☆14Nov 6, 2020Updated 5 years ago
- Generate custom text files for dataloader within UDA methods☆14May 24, 2023Updated 2 years ago
- Test equality between a black-box LLM API and a reference distribution☆13Oct 29, 2024Updated last year
- Codes for our CCL 2021 paper: Incorporating Commonsense Knowledge into Abstractive Dialogue Summarization via Heterogeneous Graph Network…☆26Jul 28, 2021Updated 4 years ago
- official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries☆69Nov 10, 2025Updated 5 months ago
- ☆14Oct 17, 2024Updated last year
- code released for our TIP 2021 paper "Adversarial Domain Adaptation with Prototype-based Normalized Output Conditioner"☆15May 24, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- RAB: Provable Robustness Against Backdoor Attacks☆39Oct 3, 2023Updated 2 years ago
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆18Apr 24, 2024Updated last year
- Corresponding code to "Improving Robustness of ML Classifiers against Realizable Evasion Attacks Using Conserved Features" @ USENIX Secur…☆11Aug 5, 2019Updated 6 years ago
- [ICLR 2025] Official implementation for "StringLLM: Understanding the String Processing Capability of Large Language Models"☆22Jan 23, 2025Updated last year
- ☆12May 6, 2022Updated 3 years ago
- [ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks☆14Feb 6, 2024Updated 2 years ago
- ☆15Dec 10, 2024Updated last year
- AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies☆30Aug 14, 2024Updated last year
- ECCV 2022☆16Aug 3, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Corresponding code to "FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems" @ CVPR 2021☆13Jun 22, 2021Updated 4 years ago
- ☆199Nov 26, 2023Updated 2 years ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆39Jan 17, 2025Updated last year
- ☆14Jun 6, 2023Updated 2 years ago
- Code and data repository for "The Mirage of Model Editing: Revisiting Evaluation in the Wild"☆18Aug 27, 2025Updated 7 months ago
- Adversarial Robustness in Graph Neural Networks: A Hamiltonian Energy Conservation Approach☆16Apr 27, 2024Updated last year
- ☆13Oct 14, 2020Updated 5 years ago