LLM-Tuning-Safety / LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
☆290Updated last year
Alternatives and similar repositories for LLMs-Finetuning-Safety:
Users that are interested in LLMs-Finetuning-Safety are comparing it to the libraries listed below
- ☆170Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆129Updated 9 months ago
- [ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…☆319Updated 3 months ago
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873