shauli-ravfogel / adv-kernel-removal
☆11Updated last year
Related projects: ⓘ
- ☆33Updated 2 years ago
- ☆29Updated 10 months ago
- ☆54Updated last week
- ☆66Updated last month
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆17Updated 5 months ago
- Universal Neurons in GPT2 Language Models☆25Updated 3 months ago
- ☆16Updated 3 months ago
- Tasks for describing differences between text distributions.☆15Updated last month
- ☆27Updated last year
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆27Updated 10 months ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆15Updated 3 weeks ago
- ☆49Updated last year
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆15Updated last year
- ☆23Updated last year
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆37Updated last year
- ☆26Updated 5 months ago
- ☆12Updated 8 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs?☆19Updated 3 months ago
- ☆27Updated last year
- ☆12Updated 9 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆57Updated last week
- ☆75Updated this week
- ☆64Updated last month
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆23Updated 3 months ago
- Official Repository for Dataset Inference for LLMs☆21Updated last month
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆28Updated 6 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆72Updated 4 months ago
- Sparse and discrete interpretability tool for neural networks☆51Updated 7 months ago
- ☆23Updated 4 months ago
- ☆15Updated 2 months ago