evan-lloyd / graphpatch
graphpatch is a library for activation patching on PyTorch neural network models.
☆13Updated last week
Alternatives and similar repositories for graphpatch:
Users that are interested in graphpatch are comparing it to the libraries listed below
- ☆25Updated 6 months ago
- ☆116Updated last year
- ☆86Updated last week
- ☆16Updated 10 months ago
- Latent Diffusion Language Models☆68Updated last year
- Mechanistic Interpretability for Transformer Models☆49Updated 2 years ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆95Updated 3 months ago
- ☆54Updated 3 months ago
- ☆12Updated last week
- ☆25Updated 10 months ago
- ☆50Updated this week
- ☆86Updated last year
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- Sparse Autoencoder Training Library☆41Updated 3 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 8 months ago
- ☆150Updated this week
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆40Updated 2 months ago
- Universal Neurons in GPT2 Language Models☆27Updated 8 months ago
- WIP☆93Updated 6 months ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆35Updated last year
- ☆18Updated last year
- A library for mechanistic anomaly detection☆19Updated last month
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆73Updated 6 months ago
- ☆33Updated 5 months ago
- ☆53Updated last year
- Sparse and discrete interpretability tool for neural networks☆58Updated last year
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆23Updated 10 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆182Updated 2 months ago
- ☆29Updated this week