BunsenFeng / modular_pluralismLinks
Modular Pluralism @ EMNLP 2024
☆20Updated last year
Alternatives and similar repositories for modular_pluralism
Users that are interested in modular_pluralism are comparing it to the libraries listed below
Sorting:
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆83Updated 8 months ago
- This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Models☆55Updated last year
- ☆180Updated last year
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆98Updated 2 years ago
- ☆57Updated 2 years ago
- ☆116Updated last year
- ☆29Updated last year
- ☆98Updated 2 years ago
- ☆21Updated last year
- The Prism Alignment Project☆84Updated last year
- A resource repository for representation engineering in large language models☆140Updated 11 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆182Updated 6 months ago
- [ICLR 2025] General-purpose activation steering library☆115Updated last month
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆135Updated last year
- ☆46Updated last year
- Repository for the Bias Benchmark for QA dataset.☆129Updated last year
- ☆102Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆119Updated 2 years ago
- ☆47Updated last month
- Official reposity for paper "High-Dimension Human Value Representation in Large Language Models" (NAACL'25 Main)☆23Updated last year
- Sparse probing paper full code.☆63Updated last year
- ☆25Updated 4 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆118Updated last year
- Algebraic value editing in pretrained language models☆66Updated 2 years ago
- Steering Llama 2 with Contrastive Activation Addition☆191Updated last year
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆27Updated last year
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆110Updated 2 years ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆94Updated 2 years ago
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975☆38Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆59Updated last year