SALT-NLP / chain-of-thought-bias
☆24Updated 4 months ago
Alternatives and similar repositories for chain-of-thought-bias:
Users that are interested in chain-of-thought-bias are comparing it to the libraries listed below
- Restore safety in fine-tuned language models through task arithmetic☆26Updated 10 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆57Updated last year
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆66Updated 2 years ago
- ☆22Updated 4 months ago
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975☆37Updated last year
- [NAACL'25] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆44Updated 2 months ago
- BeHonest: Benchmarking Honesty in Large Language Models☆31Updated 6 months ago
- ☆25Updated last year
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆63Updated 11 months ago
- AbstainQA, ACL 2024☆25Updated 4 months ago
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆31Updated last year
- ☆47Updated 10 months ago
- Methods and evaluation for aligning language models temporally☆27Updated 11 months ago
- Evaluating the Ripple Effects of Knowledge Editing in Language Models☆53Updated 10 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆79Updated 5 months ago
- ☆36Updated last year
- ☆44Updated 5 months ago
- ☆85Updated 2 years ago
- Code for ACL 2023 paper "BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases".☆21Updated last year
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆22Updated 11 months ago
- ☆17Updated last year
- LoFiT: Localized Fine-tuning on LLM Representations☆32Updated last month
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆77Updated 9 months ago
- ☆37Updated last year
- ☆85Updated last year
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆23Updated 7 months ago
- ☆60Updated 2 years ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆65Updated 10 months ago
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆46Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆62Updated 3 months ago