BunsenFeng / modular_pluralism
Modular Pluralism @ EMNLP 2024
☆17Updated 5 months ago
Alternatives and similar repositories for modular_pluralism:
Users that are interested in modular_pluralism are comparing it to the libraries listed below
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆62Updated 3 months ago
- This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Models☆49Updated last year
- A resource repository for representation engineering in large language models☆104Updated 3 months ago
- ☆89Updated last year
- ☆47Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆124Updated 8 months ago
- ☆30Updated 9 months ago
- ☆34Updated last year
- ☆22Updated 11 months ago
- ☆104Updated 9 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆65Updated 11 months ago
- ☆154Updated 8 months ago
- [NAACL'25] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆48Updated 2 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆88Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆84Updated last week
- Inspecting and Editing Knowledge Representations in Language Models☆112Updated last year
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆89Updated 3 years ago
- ☆22Updated 4 months ago
- Official reposity for paper "High-Dimension Human Value Representation in Large Language Models"☆22Updated 7 months ago
- ☆61Updated last year
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆23Updated 7 months ago
- ☆30Updated 9 months ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆106Updated 5 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆71Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆57Updated last year
- ☆76Updated 6 months ago
- Augmenting Statistical Models with Natural Language Parameters☆22Updated 5 months ago
- Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)☆13Updated 4 months ago
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975☆37Updated last year
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆71Updated 2 months ago