shallinan1 / MarcoDetoxificationLinks
Repository for "Detoxification with MaRCo: Controllable Revision with Experts and Anti-Experts"
☆9Updated last year
Alternatives and similar repositories for MarcoDetoxification
Users that are interested in MarcoDetoxification are comparing it to the libraries listed below
Sorting:
- ☆51Updated 2 years ago
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆94Updated 3 years ago
- "Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)☆87Updated last year
- Code for ACL 2023 paper "BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases".☆21Updated last year
- This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".☆88Updated 3 years ago
- ☆110Updated last year
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆82Updated 10 months ago
- Repository for research in the field of Responsible NLP at Meta.☆201Updated 2 months ago
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆15Updated 2 years ago
- ☆108Updated 3 years ago
- Inspecting and Editing Knowledge Representations in Language Models☆116Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆139Updated 7 months ago
- ☆29Updated last year
- ☆50Updated last year
- ☆75Updated last year
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆33Updated 2 years ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"☆38Updated 2 years ago
- ☆93Updated last year
- Aligning AI With Shared Human Values (ICLR 2021)☆289Updated 2 years ago
- ☆86Updated last year
- ☆294Updated last week
- Code for paper "CrossFit : A Few-shot Learning Challenge for Cross-task Generalization in NLP" (https://arxiv.org/abs/2104.08835)☆111Updated 3 years ago
- Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics☆200Updated 2 years ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆49Updated 9 months ago
- ☆137Updated last year
- Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)☆13Updated 9 months ago
- ☆95Updated last year
- ☆61Updated 2 years ago
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆108Updated last year
- This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”☆85Updated 3 years ago