MadryLab / AT2
Attribute statements generated by LLMs to preceding tokens using attention weights.
☆12Updated this week
Alternatives and similar repositories for AT2:
Users that are interested in AT2 are comparing it to the libraries listed below
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/☆22Updated last month
- ☆37Updated 4 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆43Updated 6 months ago
- CSCW 2023 Best Demo Award: Conversational AI Explanations to Support Human-AI Scientific Writing☆13Updated last year
- DecompX: Explaining Transformers Decisions by Propagating Token Decomposition☆18Updated last year
- ☆34Updated last year
- Materials for "Quantifying the Plausibility of Context Reliance in Neural Machine Translation" at ICLR'24 🐑 🐑☆14Updated last year
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers☆21Updated last year
- Measuring the Mixing of Contextual Information in the Transformer☆29Updated last year
- ☆49Updated last year
- CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior☆12Updated 2 years ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"☆37Updated 2 years ago
- Codes for "Benchmarking the Generation of Fact Checking Explanations"☆10Updated 8 months ago
- 👩💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"☆20Updated last year
- ☆26Updated 2 years ago
- This repository contains data, code and models for contextual noncompliance.☆21Updated 9 months ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"☆15Updated last year
- The codebase for Inducing Causal Structure for Interpretable Neural Networks☆10Updated 3 years ago
- Codebase for running (conditional) probing experiments☆22Updated 2 years ago
- State of What Art? A Call for Multi-Prompt LLM Evaluation☆14Updated 9 months ago
- The geometry of multilingual language model representations (EMNLP 2022).☆20Updated 2 years ago
- ☆23Updated last year
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆11Updated 2 months ago
- Can Large Language Models Be an Alternative to Human Evaluations?☆9Updated last year
- Code for preprint: Summarizing Differences between Text Distributions with Natural Language☆42Updated 2 years ago
- ☆16Updated 2 months ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆10Updated last month
- Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)☆13Updated 6 months ago
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆41Updated last year
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆27Updated 7 months ago