chziakas / redevalLinks
A library for red-teaming LLM applications with LLMs.
☆29Updated last year
Alternatives and similar repositories for redeval
Users that are interested in redeval are comparing it to the libraries listed below
Sorting:
- Red-Teaming Language Models with DSPy☆250Updated 11 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆100Updated 9 months ago
- ☆26Updated last year
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆53Updated last year
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆112Updated last year
- ☆66Updated 4 months ago
- ☆188Updated last month
- Code for the paper "Fishing for Magikarp"☆179Updated 8 months ago
- LLM security and privacy☆53Updated last year
- The fastest Trust Layer for AI Agents☆149Updated 8 months ago
- ☆29Updated 8 months ago
- A prompt injection game to collect data for robust ML research☆68Updated last year
- ☆113Updated last month
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Updated 9 months ago
- Payloads for Attacking Large Language Models☆118Updated 2 weeks ago
- autoredteam: code for training models that automatically red team other language models☆15Updated 2 years ago
- Code to break Llama Guard☆32Updated 2 years ago
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆68Updated last year
- This project investigates the security of large language models by performing binary classification of a set of input prompts to discover…☆56Updated 2 years ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".