maxdreyer / PURELinks
Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Workshop (spotlight)
ā19Updated last year
Alternatives and similar repositories for PURE
Users that are interested in PURE are comparing it to the libraries listed below
Sorting:
- ā51Updated 2 years ago
- š Overcomplete is a Vision-based SAE Toolboxā101Updated 2 weeks ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Featuresā25Updated last year
- Sparse Autoencoder Training Libraryā55Updated 6 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsā141Updated 4 months ago
- ā194Updated last month
- ā23Updated last year
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformersā41Updated 9 months ago
- ā34Updated 2 years ago
- ā37Updated 11 months ago
- Tools for optimizing steering vectors in LLMs.ā14Updated 7 months ago
- ā16Updated 6 months ago
- ā110Updated 9 months ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"ā63Updated 5 months ago
- ā136Updated this week
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models ā¦ā225Updated last week
- Steering vectors for transformer language models in Pytorch / Huggingfaceā129Updated 8 months ago
- [ICLR 2025] General-purpose activation steering libraryā119Updated 2 months ago
- Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious Features in ImageNetā32Updated 2 years ago
- Trains Sparse Autoencoders based on outputs from language modelsā11Updated last year
- [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networksā13Updated 6 months ago
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)ā40Updated last year
- ā75Updated 2 years ago
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the pā¦ā12Updated 9 months ago
- ā19Updated 7 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.ā56Updated 3 weeks ago
- Code repo for the model organisms and convergent directions of EM papers.ā36Updated last month
- What do we learn from inverting CLIP models?ā56Updated last year
- ā59Updated 2 years ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"ā42Updated last year