rb81 / prompt-hacking-classifierView on GitHub
A flexible and portable solution that uses a single robust prompt and customized hyperparameters to classify user messages as either malicious or safe, helping to prevent jailbreaking and manipulation of chatbots and other LLM-based solutions.
16Aug 8, 2025Updated 6 months ago

Alternatives and similar repositories for prompt-hacking-classifier

Users that are interested in prompt-hacking-classifier are comparing it to the libraries listed below

Sorting:

Are these results useful?