shallinan1 / MarcoDetoxification
Repository for "Detoxification with MaRCo: Controllable Revision with Experts and Anti-Experts"
☆9Updated last year
Alternatives and similar repositories for MarcoDetoxification:
Users that are interested in MarcoDetoxification are comparing it to the libraries listed below
- Data set for LREC 2020 paper "I Feel Offended, Don't Be Abusive!"☆18Updated last year
- ☆24Updated 2 years ago
- ☆49Updated last year
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆53Updated 2 years ago
- ☆44Updated last year
- This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”☆85Updated 2 years ago
- Repository for ACL 2022 paper Mix and Match: Learning-free Controllable Text Generation using Energy Language Models☆42Updated 3 years ago
- [ACL 2020] Towards Debiasing Sentence Representations☆65Updated 2 years ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆43Updated 6 months ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"☆37Updated 2 years ago
- This repository contains data, code and models for contextual noncompliance.☆21Updated 9 months ago
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975☆37Updated last year
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆80Updated 7 months ago
- ☆19Updated last year
- [ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models☆61Updated 2 years ago
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156☆32Updated last year
- Models for automatically transforming toxic text to neutral☆34Updated last year
- Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024☆17Updated last month
- ☆39Updated 2 years ago
- Code for ACL 2023 paper "BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases".☆21Updated last year
- ☆29Updated 11 months ago
- ☆30Updated 2 years ago
- A codebase for ACL 2023 paper: Mitigating Label Biases for In-context Learning☆10Updated last year
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆15Updated 2 years ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/☆22Updated last month
- ☆32Updated 11 months ago
- This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".☆88Updated 3 years ago
- "Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)☆85Updated last year
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆33Updated last year
- [ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Pr…☆22Updated 10 months ago