chziakas / redeval
A library for red-teaming LLM applications with LLMs.
☆24Updated 3 months ago
Alternatives and similar repositories for redeval:
Users that are interested in redeval are comparing it to the libraries listed below
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆36Updated 4 months ago
- Red-Teaming Language Models with DSPy☆154Updated 9 months ago
- A prompt injection game to collect data for robust ML research☆49Updated 3 weeks ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆88Updated 7 months ago
- Code for the paper "Fishing for Magikarp"☆140Updated this week
- ☆26Updated 2 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆93Updated 10 months ago
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆58Updated 11 months ago
- A collection of automated evaluators for assessing jailbreak attempts.☆92Updated last week
- ☆19Updated 2 months ago
- ☆39Updated 5 months ago
- The Library for LLM-based web-agent applications☆46Updated last week
- ☆67Updated last month
- LLM security and privacy☆43Updated 3 months ago
- Realign is a testing and simulation framework for AI applications.☆14Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 6 months ago
- This project investigates the security of large language models by performing binary classification of a set of input prompts to discover…☆36Updated last year
- Whispers in the Machine: Confidentiality in LLM-integrated Systems☆31Updated last month
- LLM Evals for Text Summarization and RAG use-cases.☆35Updated 11 months ago
- Dataset for the Tensor Trust project☆35Updated 10 months ago
- ☆16Updated 7 months ago
- Sphynx Hallucination Induction☆51Updated 5 months ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆79Updated 8 months ago
- Track the progress of LLM context utilisation☆53Updated 6 months ago
- Improving Alignment and Robustness with Circuit Breakers☆175Updated 3 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆48Updated 5 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated 8 months ago
- ☆67Updated 2 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆66Updated 10 months ago
- Manual Prompt Injection / Red Teaming Tool☆14Updated 3 months ago