sophie-xhonneux / Continuous-AdvTrainLinks
☆27Updated 11 months ago
Alternatives and similar repositories for Continuous-AdvTrain
Users that are interested in Continuous-AdvTrain are comparing it to the libraries listed below
Sorting:
- [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion☆50Updated 9 months ago
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆68Updated this week
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"☆56Updated 11 months ago
- This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"☆43Updated 8 months ago
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆10Updated 9 months ago
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"