dynamical-inference / patchsaeLinks

Implementation of PatchSAE as presented in "Sparse autoencoders reveal selective remapping of visual concepts during adaptation"

☆18

Alternatives and similar repositories for patchsae

Users that are interested in patchsae are comparing it to the libraries listed below

Sorting:

hamidkazemi22 / CLIPInversion
What do we learn from inverting CLIP models?
☆55Updated last year
kingdy2002 / SPA
☆18Updated 6 months ago
gortizji / tangent_task_arithmetic
Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".
☆102Updated 2 years ago
dtch1997 / steering-bench
Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"
☆14Updated 7 months ago
OSU-NLP-Group / saev
Sparse autoencoders for vision
☆37Updated 3 weeks ago
aypan17 / latentqa
☆19Updated 3 months ago
locuslab / diffusion-model-hallucination
☆44Updated 10 months ago
sail-sg / D-TRAK
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
☆31Updated last year
wang-kee / LiNeS
Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"
☆29Updated 8 months ago
rohitgandikota / erasing-llm
Erasing conceptual knowledge from language models through low-rank fine-tuning
☆19Updated 3 months ago
clemneo / llava-interp
☆57Updated 8 months ago
UKPLab / iclr2024-model-merging
This is the repository for "Model Merging by Uncertainty-Based Gradient Matching", ICLR 2024.
☆28Updated last year
peterljq / Parsimonious-Concept-Engineering
PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)
☆38Updated 8 months ago
Heidelberg-NLP / CC-SHAP-VLM
Official code implementation for the paper "Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Expl…
☆12Updated 3 months ago
nickjiang2378 / vl-interp
Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)
☆75Updated last month
revelio-diffusion / revelio
☆22Updated 3 weeks ago
paulgavrikov / vlm_shapebias
Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).
☆26Updated 5 months ago
pliang279 / HEMM
Holistic evaluation of multimodal foundation models
☆48Updated 11 months ago
vl-rewardbench / VL_RewardBench
☆16Updated 2 months ago
apple / ml-act
☆51Updated 7 months ago
bpwu1 / confidence-regulation-neurons
Confidence Regulation Neurons in Language Models (NeurIPS 2024)
☆10Updated 5 months ago
yossigandelsman / second_order_lens
Official pytorch implementation of "Interpreting the Second-Order Effects of Neurons in CLIP"
☆39Updated 8 months ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆31Updated 4 months ago
YuxinWenRick / diffusion_memorization
Official repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)
☆76Updated last year
ExplainableML / fomo_in_flux
Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]
☆57Updated 7 months ago
Trustworthy-ML-Lab / CLIP-dissect
[ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs
☆54Updated last year
EvolvingLMMs-Lab / multimodal-sae
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆145Updated last week
gstoica27 / KnOTS
Model Merging with SVD to Tie the KnOTS [ICLR 2025]
☆59Updated 3 months ago
lyan62 / FoodieQA
Official Repo for FoodieQA paper (EMNLP 2024)
☆16Updated 3 weeks ago
jiah-li / magic
The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.
☆10Updated 7 months ago