jianggy / MPILinks

This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Models

☆52

Alternatives and similar repositories for MPI

Users that are interested in MPI are comparing it to the libraries listed below

Sorting:

sotopia-lab / sotopia
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
☆236Updated this week
joeljang / RLPHF
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
☆108Updated last year
cicl-stanford / procedural-evals-tom
☆33Updated 2 years ago
roeehendel / icl_task_vectors
☆96Updated last year
sotopia-lab / sotopia-pi
Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)
☆70Updated last year
Mars-tin / awesome-theory-of-mind
Machine Theory of Mind Reading List. Built upon EMNLP Findings 2023 Paper: Towards A Holistic Landscape of Situated Theory of Mind in Lar…
☆140Updated 5 months ago
Walter0807 / RepBelief
[ICML 2024] Language Models Represent Beliefs of Self and Others
☆33Updated 10 months ago
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆176Updated 3 months ago
ZhaofengWu / counterfactual-evaluation
☆56Updated 2 months ago
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆114Updated 10 months ago
asaparov / prontoqa
Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
☆147Updated 9 months ago
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆116Updated 2 years ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 11 months ago
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆59Updated 10 months ago
HannahKirk / prism-alignment
The Prism Alignment Project
☆79Updated last year
edenbiran / RippleEdits
Evaluating the Ripple Effects of Knowledge Editing in Language Models
☆56Updated last year
Glaciohound / LM-Steer
Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)
☆123Updated 3 weeks ago
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆122Updated 8 months ago
penguinnnnn / awesome-llm-and-society
Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.
☆49Updated last year
YuxiXie / SelfEval-Guided-Decoding
☆100Updated last year
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆163Updated 3 months ago
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆128Updated last year
deeplearning-wisc / args
☆43Updated last year
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆74Updated 5 months ago
mukhal / GRACE
[EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning
☆48Updated 9 months ago
saprmarks / geometry-of-truth
☆89Updated last year
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆147Updated 9 months ago
GAIR-NLP / Preference-Dissection
☆25Updated last year
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆62Updated 8 months ago
facebookresearch / RLCD
Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment
☆69Updated last year