shallinan1 / MarcoDetoxificationLinks

Repository for "Detoxification with MaRCo: Controllable Revision with Experts and Anti-Experts"

☆9

Alternatives and similar repositories for MarcoDetoxification

Users that are interested in MarcoDetoxification are comparing it to the libraries listed below

Sorting:

balevinstein / Probes
☆51Updated 2 years ago
mega002 / ff-layers
The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…
☆94Updated 3 years ago
kawine / dataset_difficulty
"Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)
☆87Updated last year
launchnlp / BOLT
Code for ACL 2023 paper "BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases".
☆21Updated last year
timoschick / self-debiasing
This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".
☆88Updated 3 years ago
tatsu-lab / opinions_qa
☆110Updated last year
joeljang / knowledge-unlearning
[ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models
☆82Updated 10 months ago
facebookresearch / ResponsibleNLP
Repository for research in the field of Responsible NLP at Meta.
☆201Updated 2 months ago
xiye17 / TextualExplInContext
The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)
☆15Updated 2 years ago
qkaren / COLD_decoding
☆108Updated 3 years ago
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆116Updated last year
McGill-NLP / bias-bench
ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.
☆139Updated 7 months ago
dannyallover / overthinking_the_truth
☆29Updated last year
jaehunjung1 / Maieutic-Prompting
☆50Updated last year
GXimingLu / Quark
☆75Updated last year
lifan-yuan / OOD_NLP
[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…
☆33Updated 2 years ago
ekinakyurek / influence
Code for "Tracing Knowledge in Language Models Back to the Training Data"
☆38Updated 2 years ago
zlin7 / UQ-NLG
☆93Updated last year
hendrycks / ethics
Aligning AI With Shared Human Values (ICLR 2021)
☆289Updated 2 years ago
gorokoba560 / norm-analysis-of-transformer
☆86Updated last year
google-research / lm-extraction-benchmark
☆294Updated last week
INK-USC / CrossFit
Code for paper "CrossFit : A Few-shot Learning Challenge for Cross-task Generalization in NLP" (https://arxiv.org/abs/2104.08835)
☆111Updated 3 years ago
allenai / cartography
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
☆200Updated 2 years ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆49Updated 9 months ago
i-gallegos / Fair-LLM-Benchmark
☆137Updated last year
cambridgeltl / zepo
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)
☆13Updated 9 months ago
roeehendel / icl_task_vectors
☆95Updated last year
aviclu / ffn-values
☆61Updated 2 years ago
p-lambda / incontext-learning
Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…
☆108Updated last year
awebson / prompt_semantics
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”
☆85Updated 3 years ago