haizelabs / BEAST-implementationLinks
☆16Updated last year
Alternatives and similar repositories for BEAST-implementation
Users that are interested in BEAST-implementation are comparing it to the libraries listed below
Sorting:
- A utility to inspect, validate, sign and verify machine learning model files.☆57Updated 4 months ago
- Tree of Attacks (TAP) Jailbreaking Implementation☆109Updated last year
- General research for Dreadnode☆23Updated 11 months ago
- ☆65Updated 4 months ago
- Sphynx Hallucination Induction☆54Updated 4 months ago
- ☆22Updated last year
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆91Updated last month
- A YAML based format for describing tools to LLMs, like man pages but for robots!☆71Updated last month
- https://arxiv.org/abs/2412.02776☆54Updated 6 months ago
- Manual Prompt Injection / Red Teaming Tool☆31Updated 8 months ago
- Red-Teaming Language Models with DSPy☆195Updated 3 months ago
- Data Scientists Go To Jupyter☆64Updated 3 months ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆87Updated last year
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 10 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆51Updated 9 months ago
- Code to break Llama Guard☆31Updated last year
- ☆34Updated 6 months ago
- CLI and API server for https://github.com/dreadnode/robopages☆32Updated last month
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆110Updated last year
- Small tools to assist with using Large Language Models☆11Updated last year
- A collection of prompt injection mitigation techniques.☆23Updated last year
- ☆14Updated 5 months ago
- using ML models for red teaming☆43Updated last year
- ☆71Updated 6 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆94Updated this week
- future-proof vulnerability detection benchmark, based on CVEs in open-source repos☆56Updated last week
- ☆54Updated 8 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆49Updated 7 months ago
- ☆33Updated 2 months ago
- ☆39Updated 8 months ago