Beijing-AISI / panda-guardView external linksLinks
Panda Guard is designed for researching jailbreak attacks, defenses, and evaluation algorithms for large language models (LLMs).
☆61Jan 19, 2026Updated 3 weeks ago
Alternatives and similar repositories for panda-guard
Users that are interested in panda-guard are comparing it to the libraries listed below
Sorting:
- ☆121Feb 3, 2025Updated last year
- ☆55May 21, 2025Updated 8 months ago
- [CVPR2025] T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation☆32Jul 10, 2025Updated 7 months ago
- AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.☆228Aug 29, 2025Updated 5 months ago
- [NAACL 2022] "SemAttack: Natural Textual Attacks via Different Semantic Spaces" by Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, Bo Li☆21Jun 11, 2022Updated 3 years ago
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models☆34Oct 19, 2023Updated 2 years ago
- Auditing agents for fine-tuning safety☆18Oct 21, 2025Updated 3 months ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Jul 9, 2024Updated last year
- ☆36Aug 28, 2025Updated 5 months ago
- ☆34Nov 12, 2024Updated last year
- learn llvm from scratch☆14Apr 29, 2023Updated 2 years ago
- ☆22Dec 11, 2025Updated 2 months ago
- ☆11Oct 31, 2024Updated last year
- ☆10Jul 13, 2024Updated last year
- choose demo☆20Nov 6, 2025Updated 3 months ago
- ☆18Feb 16, 2025Updated last year
- ☆12Jun 11, 2025Updated 8 months ago
- [2025-上海人工智能实验室书生实训营十佳、优秀项目]☆43Sep 22, 2025Updated 4 months ago
- ☆14Feb 26, 2025Updated 11 months ago
- ☆21Jun 16, 2025Updated 8 months ago
- ☆15Jun 10, 2022Updated 3 years ago
- A toolkit for testing and improving named entity recognition [ESEC/FSE'23]☆11Aug 31, 2023Updated 2 years ago
- ☆17Feb 6, 2026Updated last week
- ☆10Oct 28, 2020Updated 5 years ago
- ☆11Jan 19, 2025Updated last year
- Long Context Research☆26Jan 26, 2026Updated 2 weeks ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- ☆13May 15, 2025Updated 9 months ago
- ☆47Feb 4, 2026Updated last week
- An open-source Agent Skill framework implementing progressive disclosure architecture☆40Jan 30, 2026Updated 2 weeks ago
- kernel module for modifying device information...☆22Sep 24, 2025Updated 4 months ago
- A supervised fine-tuning method for controllable reasoning length in large language models (一种通过有监督微调实现大语言模型思考长度可控的方法)☆10May 8, 2025Updated 9 months ago
- Official Code Implementation for the CCS 2022 Paper "On the Privacy Risks of Cell-Based NAS Architectures"☆11Nov 21, 2022Updated 3 years ago
- The official codes for our paper at COLING 2022: Semantic-Preserving Adversarial Code Comprehension☆12Oct 23, 2022Updated 3 years ago
- Dataset for the Tensor Trust project☆48Mar 17, 2024Updated last year
- sealos deck☆11Mar 30, 2024Updated last year
- [ICLR 2022] Boosting Randomized Smoothing with Variance Reduced Classifiers☆12Mar 29, 2022Updated 3 years ago
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews☆17Dec 14, 2025Updated 2 months ago
- [EMNLP'22] Textual Manifold-based Defense Against Natural Language Adversarial Examples☆11Apr 6, 2023Updated 2 years ago