NextWordDev / psychoevalsLinks
Repository for PsychoEvals - a framework for LLM security, psychoanalysis, and moderation.
☆18Updated 2 years ago
Alternatives and similar repositories for psychoevals
Users that are interested in psychoevals are comparing it to the libraries listed below
Sorting:
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆452Updated last year
- LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins☆29Updated last year
- Analyzing and scoring reasoning traces of LLMs☆47Updated last year
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆564Updated last year
- autoredteam: code for training models that automatically red team other language models☆15Updated 2 years ago
- Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…☆70Updated 2 years ago
- 📚 A curated list of papers & technical articles on AI Quality & Safety☆200Updated 9 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆126Updated 2 years ago
- Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023☆251Updated 2 years ago
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆349Updated 3 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆377Updated last year
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.☆345Updated last year
- ☆23Updated 2 years ago
- ☆56Updated 10 months ago
- Curation of prompts that are known to be adversarial to large language models☆188Updated 2 years ago
- Can AI-Generated Text be Reliably Detected?☆88Updated 2 years ago
- ☆228Updated 4 years ago
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆152Updated 5 months ago
- Whispers in the Machine: Confidentiality in Agentic Systems☆41Updated last month
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆108Updated last year
- ☆193Updated 2 years ago
- ☆100Updated last year
- TAP: An automated jailbreaking method for black-box LLMs☆217Updated last year
- The opensoure repository of FuzzLLM☆36Updated last year
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆54Updated last year
- Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.☆230Updated 8 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆105Updated last year
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆338Updated last year
- Official repo for Customized but Compromised: Assessing Prompt Injection Risks in User-Designed GPTs☆30Updated 2 years ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆100Updated 2 years ago