chziakas / redevalLinks

A library for red-teaming LLM applications with LLMs.

☆28

Alternatives and similar repositories for redeval

Users that are interested in redeval are comparing it to the libraries listed below

Sorting:

haizelabs / dspy-redteam
Red-Teaming Language Models with DSPy
☆238Updated 9 months ago
Babelscape / ALERT
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
☆49Updated last year
haizelabs / get-haized
A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.
☆99Updated 7 months ago
ZenGuard-AI / fast-llm-security-guardrails
The fastest Trust Layer for AI Agents
☆145Updated 6 months ago
cohere-ai / magikarp
Code for the paper "Fishing for Magikarp"
☆175Updated 6 months ago
HumanCompatibleAI / tensor-trust
A prompt injection game to collect data for robust ML research
☆65Updated 10 months ago
redteaming-arena / redteam-arena
☆35Updated 5 months ago
haizelabs / bijection-learning
☆26Updated last year
lve-org / lve
A repository of Language Model Vulnerabilities and Exposures (LVEs).
☆112Updated last year
haizelabs / sphynx
Sphynx Hallucination Induction
☆53Updated 10 months ago
andyzorigin / cybench
☆173Updated 5 months ago
stunningpixels / lou-eval
Track the progress of LLM context utilisation
☆55Updated 7 months ago
leondz / autoredteam
autoredteam: code for training models that automatically red team other language models
☆13Updated 2 years ago
agencyenterprise / PromptInject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…
☆437Updated last year
thu-coai / Backdoor-Data-Extraction
☆29Updated 6 months ago
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆122Updated last year
hwchase17 / adversarial-prompts
Curation of prompts that are known to be adversarial to large language models
☆186Updated 2 years ago
AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
☆98Updated last year
RapidResponseBench / rapidresponsebench
☆35Updated last year
uiuc-kang-lab / agentic-benchmarks
☆48Updated 4 months ago
safety-research / open-source-alignment-faking
Open Source Replication of Anthropic's Alignment Faking Paper
☆51Updated 7 months ago
Libr-AI / OpenRedTeaming
Papers about red teaming LLMs and Multimodal models.
☆156Updated 6 months ago
google-deepmind / dangerous-capability-evaluations
☆62Updated 2 months ago
oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆66Updated 11 months ago
NickNameInvalid / LLM_CTF
☆65Updated 2 months ago
briland / LLM-security-and-privacy
LLM security and privacy
☆52Updated last year
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆59Updated last month
amazon-science / CodeSage
CodeSage: Code Representation Learning At Scale (ICLR 2024)
☆114Updated last year
LostOxygen / llm-confidentiality
Whispers in the Machine: Confidentiality in Agentic Systems
☆41Updated 3 weeks ago
forcesunseen / llm-hackers-handbook
A guide to LLM hacking: fundamentals, prompt injection, offense, and defense
☆176Updated 2 years ago