NextWordDev / psychoevalsLinks
Repository for PsychoEvals - a framework for LLM security, psychoanalysis, and moderation.
☆18Updated 2 years ago
Alternatives and similar repositories for psychoevals
Users that are interested in psychoevals are comparing it to the libraries listed below
Sorting:
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆452Updated last year
- ☆54Updated 10 months ago
- Analyzing and scoring reasoning traces of LLMs☆47Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆126Updated 2 years ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆564Updated last year
- Large Language Models Meet NL2Code: A Survey☆35Updated last year
- ☆69Updated last year
- autoredteam: code for training models that automatically red team other language models☆15Updated 2 years ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆100Updated 2 years ago
- The opensoure repository of FuzzLLM☆36Updated last year
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆152Updated 5 months ago
- repo for the paper titled “CodeGen4Libs: A Two-Stage Approach for Library-Oriented Code Generation”☆14Updated 2 years ago
- Whispers in the Machine: Confidentiality in Agentic Systems☆41Updated last month
- Plurals: A System for Guiding LLMs Via Simulated Social Ensembles☆31Updated last month
- [ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`☆93Updated 5 months ago
- Test LLMs against jailbreaks and unprecedented harms☆40Updated last year
- Awesome deliberative prompting: How to ask LLMs to produce reliable reasoning and make reason-responsive decisions.☆120Updated last year
- ☆40Updated last year
- Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…☆70Updated 2 years ago
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.☆345Updated last year
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Updated 2 years ago
- Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023☆251Updated 2 years ago
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆315Updated last year
- Improving Alignment and Robustness with Circuit Breakers☆258Updated last year
- A collection of works that investigate social agents, simulations and their real-world impact in text, embodied, and robotics contexts.☆109Updated last year
- A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, current…☆80Updated last year
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆349Updated 3 months ago
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆83Updated last year
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆183Updated last year
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆107Updated 9 months ago