maxdreyer / PURE
Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Workshop (spotlight)
β14Updated 9 months ago
Alternatives and similar repositories for PURE:
Users that are interested in PURE are comparing it to the libraries listed below
- β11Updated 4 months ago
- Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious Features in ImageNetβ30Updated last year
- π Overcomplete is a Vision-based SAE Toolboxβ42Updated this week
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the pβ¦β11Updated last month
- β10Updated 4 months ago
- A toolkit for quantitative evaluation of data attribution methods.β42Updated this week
- What do we learn from inverting CLIP models?β53Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsβ62Updated last week
- Code for the paper: Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery. ECCV 2024.β38Updated 4 months ago
- β34Updated last year
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformersβ39Updated last month
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".β97Updated last year
- β14Updated last year
- β38Updated last year
- β30Updated 3 months ago
- A simple and efficient baseline for data attributionβ11Updated last year
- β89Updated last month
- β21Updated 7 months ago
- Pytorch ImageNet1k Loader with Bounding Boxes.β12Updated 3 years ago
- β10Updated 3 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)β21Updated last month
- β71Updated this week
- Code for the ICLR 2022 paper. Salient Imagenet: How to discover spurious features in deep learning?β39Updated 2 years ago
- Sparse Autoencoder Training Libraryβ43Updated 4 months ago
- β17Updated 11 months ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"β48Updated 3 weeks ago
- This repository contains the implementation of Concept Activation Regions, a new framework to explain deep neural networks with human conβ¦β11Updated 2 years ago
- Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).β21Updated last month
- β22Updated last month
- π Code for : "CRAFT: Concept Recursive Activation FacTorization for Explainability" (CVPR 2023)β62Updated last year