cooperleong00 / ToxificationReversalLinks

Code for the paper "Self-Detoxifying Language Models via Toxification Reversal" (EMNLP 2023)

☆16

Alternatives and similar repositories for ToxificationReversal

Users that are interested in ToxificationReversal are comparing it to the libraries listed below

Sorting:

iwangjian / Color4Dial
Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue (ACL Findings 2023)
☆21Updated 3 weeks ago
wangjs9 / Muffin
Codes for Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback (ACL 2024 Findings)
☆16Updated last year
wangjs9 / Aligned-dPM
PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach
☆32Updated 2 years ago
GAIR-NLP / alignment-for-honesty
☆76Updated last year
GAIR-NLP / BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
☆34Updated last year
CUHK-ARISE / LLMPersonality
Code and Results of the Paper: On the Reliability of Psychological Scales on Large Language Models
☆30Updated last year
iwangjian / TopDial
Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation (EMNLP 2023)
☆30Updated last month
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆63Updated last year
Yifan-Song793 / GoodBadGreedy
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
☆30Updated last year
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆69Updated last year
iwangjian / Midi-Tuning
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)
☆24Updated last month
penguinnnnn / awesome-llm-and-society
Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.
☆50Updated 2 years ago
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 10 months ago
FreedomIntelligence / OVM
☆68Updated last year
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆63Updated 11 months ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆70Updated 3 years ago
RUCAIBox / HaluEval-2.0
☆47Updated last year
YiCheng98 / IntegrativeDecoding
Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"
☆32Updated 7 months ago
zhaochen0110 / conflictbank
Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets and…
☆60Updated 6 months ago
GAIR-NLP / weak-to-strong-reasoning
☆58Updated last year
siyuyuan / coscript
Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning
☆36Updated 2 years ago
qinyiwei / InfoBench
☆57Updated last year
OpenMOSS / Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆84Updated last year
hkust-nlp / PEM_composition
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Updated 2 years ago
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆63Updated last year
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆119Updated last year
hanxuhu / SeqIns
The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…
☆30Updated last year
hanningzhang / prm
☆17Updated last year
Re-Align / AlignTDS
Analyzing LLM Alignment via Token distribution shift
☆17Updated last year
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆76Updated last month