leondz / autoredteamLinks
autoredteam: code for training models that automatically red team other language models
☆15Updated 2 years ago
Alternatives and similar repositories for autoredteam
Users that are interested in autoredteam are comparing it to the libraries listed below
Sorting:
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆445Updated last year
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆389Updated last month
- ☆182Updated 2 weeks ago
- Papers about red teaming LLMs and Multimodal models.☆158Updated 7 months ago
- Risks and targets for assessing LLMs & LLM vulnerabilities☆33Updated last year
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆112Updated last year
- ☆49Updated last year
- Red-Teaming Language Models with DSPy☆248Updated 10 months ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆78Updated 4 months ago
- This repository provides a benchmark for prompt injection attacks and defenses in LLMs☆365Updated 2 months ago
- ☆672Updated 6 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆370Updated 11 months ago
- The fastest Trust Layer for AI Agents☆144Updated 7 months ago
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆343Updated 2 months ago
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆502Updated 8 months ago
- LLM security and privacy☆53Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆123Updated last year
- TAP: An automated jailbreaking method for black-box LLMs☆212Updated last year
- Guardrails for secure and robust agent development☆377Updated 5 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Updated last year
- Code to break Llama Guard☆32Updated 2 years ago
- [ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability☆176Updated last year
- ☆34Updated last year
- ⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs☆433Updated last year
- A benchmark for prompt injection detection systems.☆153Updated 2 weeks ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆55Updated last year
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆559Updated last year
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆95Updated last year
- Make your GenAI Apps Safe & Secure Test & harden your system prompt☆602Updated 3 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆176Updated last year