chziakas / redevalLinks
A library for red-teaming LLM applications with LLMs.
☆28Updated last year
Alternatives and similar repositories for redeval
Users that are interested in redeval are comparing it to the libraries listed below
Sorting:
- Red-Teaming Language Models with DSPy☆238Updated 9 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆49Updated last year
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆99Updated 7 months ago
- The fastest Trust Layer for AI Agents☆145Updated 6 months ago
- Code for the paper "Fishing for Magikarp"☆175Updated 6 months ago
- A prompt injection game to collect data for robust ML research☆65Updated 10 months ago
- ☆35Updated 5 months ago
- ☆26Updated last year
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆112Updated last year
- Sphynx Hallucination Induction☆53Updated 10 months ago
- ☆173Updated 5 months ago
- Track the progress of LLM context utilisation☆55Updated 7 months ago
- autoredteam: code for training models that automatically red team other language models☆13Updated 2 years ago
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆437Updated last year
- ☆29Updated 6 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆122Updated last year
- Curation of prompts that are known to be adversarial to large language models☆186Updated 2 years ago
- ☆98Updated last year
- ☆35Updated last year
- ☆48Updated 4 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆51Updated 7 months ago
- Papers about red teaming LLMs and Multimodal models.☆156Updated 6 months ago
- ☆62Updated 2 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆66Updated 11 months ago
- ☆65Updated 2 months ago
- LLM security and privacy☆52Updated last year
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆59Updated last month
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆114Updated last year
- Whispers in the Machine: Confidentiality in Agentic Systems☆41Updated 3 weeks ago
- A guide to LLM hacking: fundamentals, prompt injection, offense, and defense☆176Updated 2 years ago