NextWordDev / psychoevalsLinks
Repository for PsychoEvals - a framework for LLM security, psychoanalysis, and moderation.
☆17Updated 2 years ago
Alternatives and similar repositories for psychoevals
Users that are interested in psychoevals are comparing it to the libraries listed below
Sorting:
- repo for the paper titled “CodeGen4Libs: A Two-Stage Approach for Library-Oriented Code Generation”☆15Updated last year
- A repository for the paper "Beliefs about AI influence human-AI interaction and can be manipulated to increase perceived trustworthiness,…☆16Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆121Updated last year
- ☆95Updated last year
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆49Updated last year
- Security measure for agentic LLMs using a council of AIs moderted by a veto system. The council judges an agent's actions outputs based o…☆38Updated 2 years ago
- Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.☆221Updated last week
- Incremental Python parser for constrained generation of code by LLMs.☆16Updated 8 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆142Updated last year
- Knowledge transfer from high-resource to low-resource programming languages for Code LLMs☆13Updated 9 months ago
- 🌟 SwarmAgent: A framework for simulating social group dynamics using multi-agent collaboration, aiding insights into collective behavior…☆12Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆100Updated last year
- ☆21Updated last year
- Data and code for the paper "NormBank: A Knowledge Bank of Situational Social Norms"☆27Updated last year
- ☆36Updated 2 months ago
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆106Updated 6 months ago
- Code and Data for: Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming☆33Updated last year
- [NeurIPS 2023] PyTorch code for Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind☆67Updated last year
- A Computational Framework for Behavioral Assessment of LLM Therapists☆27Updated 7 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆93Updated last year
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆494Updated 8 months ago
- code for "Implant Global and Local Hierarchy Information to Sequence based Code Representation Models"☆12Updated 5 months ago
- Extensible Booga AGI☆16Updated 2 years ago
- ☆26Updated 9 months ago
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆76Updated last month
- Evaluating the Moral Beliefs Encoded in LLMs☆26Updated 5 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆105Updated 8 months ago
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughts☆62Updated last year
- Code/data for MARG (multi-agent review generation)☆43Updated 6 months ago
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆21Updated last year