haizelabs / BEAST-implementation
☆16Updated 10 months ago
Alternatives and similar repositories for BEAST-implementation:
Users that are interested in BEAST-implementation are comparing it to the libraries listed below
- ☆64Updated 3 months ago
- A utility to inspect, validate, sign and verify machine learning model files.☆56Updated 2 months ago
- General research for Dreadnode☆21Updated 10 months ago
- Tree of Attacks (TAP) Jailbreaking Implementation☆106Updated last year
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆89Updated last week
- Red-Teaming Language Models with DSPy☆183Updated 2 months ago
- ☆13Updated 10 months ago
- A YAML based format for describing tools to LLMs, like man pages but for robots!☆69Updated 2 weeks ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆50Updated 8 months ago
- ☆31Updated 5 months ago
- Fluent student-teacher redteaming☆20Updated 9 months ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆82Updated 11 months ago
- ☆13Updated 4 months ago
- [IJCAI 2024] Imperio is an LLM-powered backdoor attack. It allows the adversary to issue language-guided instructions to control the vict…☆41Updated 2 months ago
- Manual Prompt Injection / Red Teaming Tool☆27Updated 6 months ago
- Data Scientists Go To Jupyter☆62Updated last month
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆109Updated last year
- A collection of prompt injection mitigation techniques.☆22Updated last year
- ☆93Updated last month
- https://arxiv.org/abs/2412.02776☆52Updated 4 months ago
- future-proof vulnerability detection benchmark, based on CVEs in open-source repos☆52Updated last week
- Codebase of https://arxiv.org/abs/2410.14923☆46Updated 6 months ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆54Updated last month
- CompChomper is a framework for measuring how LLMs perform at code completion.☆17Updated 2 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆45Updated 6 months ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 8 months ago
- A library for red-teaming LLM applications with LLMs.☆26Updated 6 months ago
- OpenPipe ART (Agent Reinforcement Trainer): train LLM agents☆108Updated this week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆89Updated this week
- source code for the offsecml framework☆38Updated 10 months ago