dynamical-inference / patchsaeLinks
Implementation of PatchSAE as presented in "Sparse autoencoders reveal selective remapping of visual concepts during adaptation"
☆29Updated last month
Alternatives and similar repositories for patchsae
Users that are interested in patchsae are comparing it to the libraries listed below
Sorting:
- Localization of Knowledge in Text-to-Image Models☆11Updated last year
- What do we learn from inverting CLIP models?☆57Updated last year
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆107Updated 2 years ago
- Sparse autoencoders for vision☆52Updated this week
- Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)☆35Updated last year
- ☆24Updated 11 months ago
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆18Updated last year
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆41Updated last year
- [ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models☆149Updated 6 months ago
- Code for the paper: Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery. ECCV 2024.☆52Updated last year
- [ICLR24 (Spotlight)] "SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation…☆141Updated 6 months ago
- 👋 Overcomplete is a Vision-based SAE Toolbox☆109Updated 2 weeks ago
- Erasing conceptual knowledge from language models through low-rank fine-tuning☆19Updated 8 months ago
- ☆76Updated last year
- [ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliabilit…☆28Updated 4 months ago
- ☆136Updated last month
- ☆16Updated 7 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆78Updated last year
- [ICLR 2025] This repository contains the code to reproduce the results from our paper From Sparse Dependence to Sparse Attention: Unveili…☆10Updated 9 months ago
- Official repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)☆77Updated last year
- [ICML 2025] Unlearning in Diffusion Models using Sparse Autoencoders☆48Updated 2 months ago
- ☆23Updated last month
- ☆112Updated 10 months ago
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Works…☆19Updated last year
- Code for [Re] On the Reproducibility of Post-Hoc Concept Bottleneck Models.☆12Updated last year
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆74Updated 9 months ago
- [NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models☆52Updated 3 weeks ago
- ☆23Updated last year
- ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).☆327Updated 4 months ago
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆51Updated last year