BatsResearch / cross-lingual-detoxLinks
Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024
☆17Updated 7 months ago
Alternatives and similar repositories for cross-lingual-detox
Users that are interested in cross-lingual-detox are comparing it to the libraries listed below
Sorting:
- ☆29Updated last year
 - Restore safety in fine-tuned language models through task arithmetic☆29Updated last year
 - Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆12Updated 9 months ago
 - Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆19Updated last year
 - Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆68Updated 3 years ago
 - ☆44Updated last year
 - ☆67Updated 2 years ago
 - Augmenting Statistical Models with Natural Language Parameters☆29Updated last year
 - A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆83Updated 7 months ago
 - ☆49Updated 2 years ago
 - AI Logging for Interpretability and Explainability🔬☆130Updated last year
 - [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆81Updated 10 months ago
 - ☆86Updated 2 years ago
 - Crosslingual Reasoning through Test-Time Scaling☆19Updated 5 months ago
 - ☆57Updated 2 years ago
 - Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆56Updated this week
 - Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆31Updated 9 months ago
 - ☆78Updated 2 years ago
 - Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆80Updated last year
 - [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"☆61Updated last year
 - Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆116Updated 8 months ago
 - Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆14Updated last year
 - ☆98Updated 2 years ago
 - LoFiT: Localized Fine-tuning on LLM Representations☆42Updated 9 months ago
 - Function Vectors in Large Language Models (ICLR 2024)☆181Updated 6 months ago
 - [ICLR 2025] General-purpose activation steering library☆115Updated last month
 - ☆52Updated 6 months ago
 - [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆35Updated 2 years ago
 - Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆63Updated 2 years ago
 - ☆37Updated 10 months ago