Panda Guard is designed for researching jailbreak attacks, defenses, and evaluation algorithms for large language models (LLMs).
☆65Jan 19, 2026Updated last month
Alternatives and similar repositories for panda-guard
Users that are interested in panda-guard are comparing it to the libraries listed below
Sorting:
- ☆122Feb 3, 2025Updated last year
- Code Implementation of Adversarial Prompt Evaluation paper☆14Sep 18, 2025Updated 5 months ago
- Official implementation of “Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models” (AAAI 2026).☆31Dec 17, 2025Updated 2 months ago
- ☆57May 21, 2025Updated 9 months ago
- [CVPR2025] T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation☆32Jul 10, 2025Updated 7 months ago
- The official repository for guided jailbreak benchmark☆29Jul 28, 2025Updated 7 months ago
- [NAACL 2022] "SemAttack: Natural Textual Attacks via Different Semantic Spaces" by Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, Bo Li☆21Jun 11, 2022Updated 3 years ago
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models☆60Apr 8, 2024Updated last year
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models☆34Oct 19, 2023Updated 2 years ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Jul 9, 2024Updated last year
- Auditing agents for fine-tuning safety☆20Oct 21, 2025Updated 4 months ago
- ☆36Aug 28, 2025Updated 6 months ago
- ☆55Updated this week
- ☆34Nov 12, 2024Updated last year
- ☆10Oct 21, 2024Updated last year
- ☆12Jun 11, 2025Updated 8 months ago
- ☆18Feb 16, 2025Updated last year
- ☆11Oct 31, 2024Updated last year
- Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning☆30Sep 29, 2025Updated 5 months ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- ☆23Dec 11, 2025Updated 2 months ago
- ☆10Jul 13, 2024Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆188Apr 1, 2025Updated 11 months ago
- ☆10Oct 28, 2020Updated 5 years ago
- Improving neural network representations using human similarity judgments☆13Nov 22, 2024Updated last year
- Long Context Research☆29Jan 26, 2026Updated last month
- ☆19May 14, 2025Updated 9 months ago
- Dataset for the Tensor Trust project☆48Mar 17, 2024Updated last year
- ☆17Feb 6, 2026Updated last month
- 安卓逆向相关的一些二进制工具 / Some bin utils for android reverse engineer☆13Apr 28, 2020Updated 5 years ago
- sealos deck☆11Mar 30, 2024Updated last year
- ☆12Feb 20, 2016Updated 10 years ago
- ☆21Jul 8, 2025Updated 8 months ago
- ☆13May 15, 2025Updated 9 months ago
- kernel module for modifying device information...☆22Sep 24, 2025Updated 5 months ago
- ☆10Jul 24, 2023Updated 2 years ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- ☆48Feb 25, 2026Updated last week
- Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models☆12Jun 21, 2024Updated last year