tatsu-lab / opinions_qa
☆100Updated 8 months ago
Alternatives and similar repositories for opinions_qa:
Users that are interested in opinions_qa are comparing it to the libraries listed below
- Inspecting and Editing Knowledge Representations in Language Models☆111Updated last year
- ☆44Updated last year
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆31Updated 10 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆66Updated last year
- ☆44Updated 4 months ago
- Repository for the Bias Benchmark for QA dataset.☆94Updated last year
- ☆22Updated 10 months ago
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆77Updated last year
- ☆61Updated last year
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆70Updated 3 years ago
- Evaluating the Moral Beliefs Encoded in LLMs☆23Updated last month
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆58Updated last year
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆40Updated last month
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆61Updated 2 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆64Updated 10 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆57Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆46Updated 5 months ago
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆40Updated last year
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆89Updated 3 years ago
- Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆39Updated last month
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆85Updated last year
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆75Updated 9 months ago
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆66Updated 2 years ago
- ☆152Updated last month
- The Prism Alignment Project☆61Updated 8 months ago
- ☆29Updated 8 months ago
- ☆75Updated 5 months ago
- LoFiT: Localized Fine-tuning on LLM Representations