Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Workshop (spotlight)
☆20May 29, 2024Updated last year
Alternatives and similar repositories for PURE
Users that are interested in PURE are comparing it to the libraries listed below
Sorting:
- [TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models☆10Feb 20, 2025Updated last year
- A tiny easily hackable implementation of a feature dashboard.☆15Oct 21, 2025Updated 4 months ago
- [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks☆14May 2, 2025Updated 10 months ago
- A toolkit for quantitative evaluation of data attribution methods.☆56Jul 14, 2025Updated 7 months ago
- Code for CVPR 2024 Oral "Neural Lineage"☆17Jun 18, 2024Updated last year
- Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).☆28Jan 26, 2025Updated last year
- Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers, Paper accepted at eXCV workshop of ECCV 2…☆30Jan 6, 2025Updated last year
- Mechanistic understanding and validation of large AI models with SemanticLens☆50Dec 4, 2025Updated 2 months ago
- ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).☆342Jul 23, 2025Updated 7 months ago
- A Robot that classifies digits and shapes☆10Jul 10, 2019Updated 6 years ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- Zennit is a high-level framework in Python using PyTorch for explaining/exploring neural networks using attribution methods like LRP.☆241Jan 30, 2026Updated last month
- ☆52Oct 23, 2023Updated 2 years ago
- Trains small LMs. Designed for training on SimpleStories☆12Sep 15, 2025Updated 5 months ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.☆12May 29, 2023Updated 2 years ago
- Code for the ICLR 2022 paper. Salient Imagenet: How to discover spurious features in deep learning?☆41Aug 19, 2022Updated 3 years ago
- ☆10Aug 14, 2023Updated 2 years ago
- ☆15Mar 13, 2025Updated 11 months ago
- Will help you with writing a report!☆10Mar 10, 2018Updated 7 years ago
- ☆13Apr 10, 2025Updated 10 months ago
- A library for training crosscoders☆16May 28, 2025Updated 9 months ago
- ☆15Aug 19, 2025Updated 6 months ago
- A repository related to the paper 'Evaluating Reliability in Medical DNNs: A Critical Analysis of Feature and Confidence-Based OOD Detect…☆10Dec 5, 2024Updated last year
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Applications for OpenCL testing on Toradex Apalis iMX6Q☆12Dec 2, 2022Updated 3 years ago
- [ECCV 2024] Characterizing Robustness via Natural Input Gradients☆13Oct 14, 2024Updated last year
- ☆72Jul 24, 2025Updated 7 months ago
- ☆209Oct 14, 2025Updated 4 months ago
- How do transformer LMs encode relations?☆56Feb 24, 2024Updated 2 years ago
- ☆11Nov 9, 2023Updated 2 years ago
- Trains Sparse Autoencoders based on outputs from language models☆11Oct 7, 2024Updated last year
- LLM play 20questions with itself☆12Mar 31, 2023Updated 2 years ago
- minimalistic AI library that resembles HF's transformers☆13Dec 31, 2024Updated last year
- Official repository of Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization [EMNLP'22 …☆10May 20, 2023Updated 2 years ago
- ☆12Jan 10, 2023Updated 3 years ago
- Accompanying code for "Analyzing Vision Tranformers in Class Embedding Space" (NeurIPS '23)☆15Jun 10, 2024Updated last year
- ☆13Apr 8, 2023Updated 2 years ago
- Fluent dreaming for language models☆13Jul 22, 2024Updated last year