evan-lloyd / graphpatch
graphpatch is a library for activation patching on PyTorch neural network models.
☆14Updated 3 months ago
Alternatives and similar repositories for graphpatch
Users that are interested in graphpatch are comparing it to the libraries listed below
Sorting:
- Mechanistic Interpretability for Transformer Models☆50Updated 2 years ago
- ☆27Updated 9 months ago
- Sparse Autoencoder Training Library☆49Updated 2 weeks ago
- ☆27Updated last year
- ☆121Updated last year
- MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…☆10Updated 7 months ago
- ☆93Updated last month
- A library for mechanistic anomaly detection☆21Updated 4 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆201Updated 5 months ago
- ☆93Updated 3 months ago
- Sparse and discrete interpretability tool for neural networks☆62Updated last year
- ☆38Updated last week
- ☆33Updated 2 weeks ago
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆16Updated 6 months ago
- ☆33Updated 4 months ago
- ☆31Updated last year
- A python sdk for LLM finetuning and inference on runpod infrastructure☆11Updated last week
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆31Updated 11 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆100Updated 2 months ago
- ☆12Updated this week
- A library for efficient patching and automatic circuit discovery.☆64Updated 3 weeks ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆76Updated 9 months ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆121Updated 2 years ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆46Updated 7 months ago
- Applying SAEs for fine-grained control☆18Updated 5 months ago
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)☆38Updated last week
- Open source replication of Anthropic's Crosscoders for Model Diffing☆55Updated 6 months ago
- ☆223Updated 7 months ago
- Utilities for the HuggingFace transformers library☆67Updated 2 years ago