haizelabs / BEAST-implementationLinks
☆16Updated last year
Alternatives and similar repositories for BEAST-implementation
Users that are interested in BEAST-implementation are comparing it to the libraries listed below
Sorting:
- Tree of Attacks (TAP) Jailbreaking Implementation☆117Updated 2 years ago
- ☆66Updated 4 months ago
- General research for Dreadnode☆27Updated last year
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆112Updated last year
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆90Updated last year
- ☆29Updated 2 years ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆100Updated 9 months ago
- A utility to inspect, validate, sign and verify machine learning model files.☆65Updated last year
- Example agents for the Dreadnode platform☆22Updated last month
- Red-Teaming Language Models with DSPy☆250Updated 11 months ago
- Multi-agent system (MAS) hijacking demos☆40Updated this week
- Code for the paper "Defeating Prompt Injections by Design"☆246Updated 7 months ago
- Arxiv + Notion Sync☆20Updated 8 months ago
- Data Scientists Go To Jupyter☆68Updated 11 months ago
- ☆38Updated 8 months ago
- [IJCAI 2024] Imperio is an LLM-powered backdoor attack. It allows the adversary to issue language-guided instructions to control the vict…☆44Updated 11 months ago
- A simple, 100% Rust implementation of a vector storage database with on disk persistency.☆31Updated last year
- CompChomper is a framework for measuring how LLMs perform at code completion.☆19Updated 9 months ago
- The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.☆27Updated last year
- https://arxiv.org/abs/2412.02776☆67Updated last year
- A prompt defence is a multi-layer defence that can be used to protect your applications against prompt injection attacks.☆21Updated last month
- using ML models for red teaming☆45Updated 2 years ago
- Sphynx Hallucination Induction☆53Updated last year
- ☆23Updated 2 years ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆134Updated this week
- An interactive CLI application for interacting with authenticated Jupyter instances.☆55Updated 9 months ago
- Codebase of https://arxiv.org/abs/2410.14923☆54Updated last year
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated last year
- A YAML based format for describing tools to LLMs, like man pages but for robots!☆84Updated 9 months ago
- Here Comes the AI Worm: Preventing the Propagation of Adversarial Self-Replicating Prompts Within GenAI Ecosystems☆222Updated 5 months ago