google-research / camel-prompt-injectionLinks
Code for the paper "Defeating Prompt Injections by Design"
☆40Updated 3 weeks ago
Alternatives and similar repositories for camel-prompt-injection
Users that are interested in camel-prompt-injection are comparing it to the libraries listed below
Sorting:
- ☆34Updated 8 months ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆59Updated 4 months ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆71Updated last year
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆112Updated last year
- Red-Teaming Language Models with DSPy☆202Updated 5 months ago
- ☆119Updated last month
- Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses☆21Updated 5 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆69Updated last year
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆202Updated last week
- A prompt injection game to collect data for robust ML research☆62Updated 5 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆92Updated 3 months ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 11 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆51Updated 10 months ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆88Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆109Updated last year
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆58Updated last week
- Dataset for the Tensor Trust project☆43Updated last year
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆113Updated last year
- CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…☆41Updated 3 weeks ago
- ☆55Updated 9 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated last year
- ☆90Updated last year
- ☆75Updated 7 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆54Updated 4 months ago
- Sphynx Hallucination Induction☆53Updated 5 months ago
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆109Updated 8 months ago
- Whispers in the Machine: Confidentiality in Agentic Systems☆39Updated last month
- Code to break Llama Guard☆31Updated last year
- A better way of testing, inspecting, and analyzing AI Agent traces.☆39Updated this week
- ☆72Updated last week