NextWordDev / psychoevalsLinks
Repository for PsychoEvals - a framework for LLM security, psychoanalysis, and moderation.
☆17Updated 2 years ago
Alternatives and similar repositories for psychoevals
Users that are interested in psychoevals are comparing it to the libraries listed below
Sorting:
- Analyzing and scoring reasoning traces of LLMs☆45Updated 9 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆53Updated 3 months ago
- ☆28Updated last year
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆49Updated last year
- examples and guides to using Nomic Atlas☆38Updated 2 months ago
- An active inference model of Lacanian psychoanalysis☆10Updated 3 weeks ago
- A set of utilities for running few-shot prompting experiments on large-language models☆121Updated last year
- Whispers in the Machine: Confidentiality in Agentic Systems☆39Updated last month
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆76Updated last month
- ☆29Updated 10 months ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 10 months ago
- APIBench is a benchmark for evaluating the performance of API recommendation approaches released in the paper "Revisiting, Benchmarking a…☆58Updated 2 years ago
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆22Updated last year
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆65Updated 7 months ago
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆115Updated 5 months ago
- ☆16Updated 9 months ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆501Updated 9 months ago
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆106Updated 7 months ago
- Dataset for the Tensor Trust project☆43Updated last year
- Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…☆50Updated 2 years ago
- Alignment with a millennium of moral progress. Spotlight@NeurIPS 2024 Track on Datasets and Benchmarks.☆22Updated 2 months ago
- Awesome deliberative prompting: How to ask LLMs to produce reliable reasoning and make reason-responsive decisions.☆119Updated 4 months ago
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆100Updated last year
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆82Updated 6 months ago
- Evaluating the Moral Beliefs Encoded in LLMs☆26Updated 6 months ago
- Easiest way to build custom agents, in a no-code notion style editor, using simple macros.☆27Updated 7 months ago
- ☆26Updated last year
- ☆43Updated 7 months ago
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆26Updated 7 months ago
- Official repo for Customized but Compromised: Assessing Prompt Injection Risks in User-Designed GPTs☆28Updated last year