chziakas / redevalLinks
A library for red-teaming LLM applications with LLMs.
☆28Updated 10 months ago
Alternatives and similar repositories for redeval
Users that are interested in redeval are comparing it to the libraries listed below
Sorting:
- Red-Teaming Language Models with DSPy☆212Updated 6 months ago
- ☆142Updated 2 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆113Updated last year
- ☆86Updated 9 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆95Updated 4 months ago
- Code for the paper "Fishing for Magikarp"☆163Updated 3 months ago
- ☆28Updated 3 months ago
- The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.☆25Updated 9 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆45Updated 11 months ago
- ☆25Updated 10 months ago
- ☆34Updated 9 months ago
- The fastest Trust Layer for AI Agents☆142Updated 3 months ago
- LLM security and privacy☆50Updated 10 months ago
- A prompt injection game to collect data for robust ML research☆63Updated 7 months ago
- ☆34Updated 2 months ago
- A benchmark for prompt injection detection systems.☆127Updated last month
- ☆25Updated last month
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆64Updated 10 months ago
- Sphynx Hallucination Induction☆53Updated 7 months ago
- Code for reproducing our paper "Are Sparse Autoencoders Useful? A Case Study in Sparse Probing"☆27Updated 5 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆112Updated last year
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆64Updated last year
- Papers about red teaming LLMs and Multimodal models.☆134Updated 3 months ago
- This project investigates the security of large language models by performing binary classification of a set of input prompts to discover…☆46Updated last year
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆90Updated 9 months ago
- 📚 A curated list of papers & technical articles on AI Quality & Safety☆191Updated 4 months ago
- ☆45Updated 4 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 11 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Updated last year
- CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…☆52Updated last month