haizelabs / BEAST-implementation
☆16Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for BEAST-implementation
- ☆61Updated 3 weeks ago
- A utility to inspect, validate, sign and verify machine learning model files.☆40Updated this week
- Improve prompts for e.g. GPT3 and GPT-J using templates and hyperparameter optimization.☆41Updated last year
- Tree of Attacks (TAP) Jailbreaking Implementation☆94Updated 9 months ago
- General research for Dreadnode☆17Updated 4 months ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 3 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆85Updated 4 months ago
- ☆127Updated last month
- ☆13Updated 4 months ago
- ☆33Updated 2 weeks ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆72Updated 5 months ago
- Red-Teaming Language Models with DSPy☆142Updated 7 months ago
- A trace analysis tool for AI agents.☆118Updated 3 weeks ago
- future-proof vulnerability detection benchmark, based on CVEs in open-source repos☆44Updated last week
- ☆57Updated last week
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆84Updated 8 months ago
- Sphynx Hallucination Induction☆47Updated 3 months ago
- A collection of prompt injection mitigation techniques.☆17Updated last year
- Data Scientists Go To Jupyter☆57Updated 2 years ago
- ☆15Updated 6 months ago
- De-redacting Elon's Email with Character-count Constrained Llama2 Decoding☆10Updated 8 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆46Updated 2 months ago
- Stage 1: Sensitive Email/Chat Classification for Adversary Agent Emulation (espionage). This project is meant to extend Red Reaper v1 whi…☆23Updated 2 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆106Updated 7 months ago
- An interactive CLI application for interacting with authenticated Jupyter instances.☆49Updated 7 months ago
- A Completely Modular LLM Reverse Engineering, Red Teaming, and Vulnerability Research Framework.☆17Updated this week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆62Updated this week
- OllaDeck is a purple technology stack for Generative AI (text modality) cybersecurity. It provides a comprehensive set of tools for both …☆13Updated last month
- Score LLM pretraining data with classifiers☆55Updated last year
- Payloads for Attacking Large Language Models☆62Updated 4 months ago