chziakas / redevalLinks
A library for red-teaming LLM applications with LLMs.
☆28Updated last year
Alternatives and similar repositories for redeval
Users that are interested in redeval are comparing it to the libraries listed below
Sorting:
- Red-Teaming Language Models with DSPy☆219Updated 8 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆96Updated 6 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆112Updated last year
- ☆26Updated 11 months ago
- The fastest Trust Layer for AI Agents☆143Updated 4 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆46Updated last year
- A prompt injection game to collect data for robust ML research☆63Updated 8 months ago
- Papers about red teaming LLMs and Multimodal models.☆144Updated 4 months ago
- ☆164Updated 4 months ago
- Code for the paper "Fishing for Magikarp"☆170Updated 5 months ago
- Code to break Llama Guard☆32Updated last year
- Sphynx Hallucination Induction☆53Updated 8 months ago
- LLM security and privacy☆51Updated last year
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆66Updated last year
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆424Updated last year
- Whispers in the Machine: Confidentiality in Agentic Systems☆41Updated last week
- ☆93Updated 11 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆55Updated last year
- This project investigates the security of large language models by performing binary classification of a set of input prompts to discover…☆50Updated last year
- ☆35Updated 11 months ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆68Updated last month
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆113Updated 11 months ago
- ☆29Updated 4 months ago
- Curation of prompts that are known to be adversarial to large language models☆184Updated 2 years ago
- Measuring the situational awareness of language models☆38Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆109Updated 10 months ago
- ☆88Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆69Updated last year
- Security Threats related with MCP (Model Context Protocol), MCP Servers and more☆36Updated 5 months ago
- The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.☆27Updated 11 months ago