chziakas / redevalLinks
A library for red-teaming LLM applications with LLMs.
☆26Updated 7 months ago
Alternatives and similar repositories for redeval
Users that are interested in redeval are comparing it to the libraries listed below
Sorting:
- Red-Teaming Language Models with DSPy☆195Updated 3 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆42Updated 8 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆91Updated last month
- ☆34Updated 6 months ago
- ☆109Updated 2 weeks ago
- Papers about red teaming LLMs and Multimodal models.☆121Updated last week
- Dataset for the Tensor Trust project☆40Updated last year
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆63Updated last year
- Sphynx Hallucination Induction☆54Updated 4 months ago
- This project investigates the security of large language models by performing binary classification of a set of input prompts to discover…☆39Updated last year
- This repository provides a benchmark for prompt Injection attacks and defenses☆216Updated last week
- The fastest Trust Layer for AI Agents☆136Updated last week
- Code to break Llama Guard☆31Updated last year
- ☆22Updated 7 months ago
- [ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability☆151Updated 5 months ago
- A prompt injection game to collect data for robust ML research☆61Updated 4 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆33Updated last month
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆69Updated last year
- ☆71Updated 6 months ago
- ☆44Updated 10 months ago
- Code for the paper "Fishing for Magikarp"☆155Updated 3 weeks ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆175Updated this week
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆16Updated 9 months ago
- ☆42Updated 2 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆49Updated 7 months ago
- ☆18Updated last year
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆36Updated 7 months ago
- ☆63Updated 11 months ago
- A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022☆31Updated last year
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆51Updated 9 months ago