This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper accepted at the ICCV CLVL Workshop 2023
☆25Feb 16, 2026Updated 2 weeks ago
Alternatives and similar repositories for Towards-Vision-Language-Mechanistic-Interpretability
Users that are interested in Towards-Vision-Language-Mechanistic-Interpretability are comparing it to the libraries listed below
Sorting:
- Applying SAEs for fine-grained control☆25Dec 15, 2024Updated last year
- ☆25Apr 23, 2024Updated last year
- ☆13Apr 10, 2025Updated 10 months ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆30Oct 27, 2025Updated 4 months ago
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆13Feb 13, 2023Updated 3 years ago
- ☆13Feb 24, 2025Updated last year
- A tiny easily hackable implementation of a feature dashboard.☆15Oct 21, 2025Updated 4 months ago
- ☆16Jun 19, 2023Updated 2 years ago
- Experiments with representation engineering☆14Feb 28, 2024Updated 2 years ago
- Memory-Based Meta-Learning on Non-Stationary Distributions☆17Mar 14, 2024Updated last year
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Mar 31, 2025Updated 11 months ago
- ☆17Jun 8, 2019Updated 6 years ago
- ☆17Feb 14, 2024Updated 2 years ago
- ☆52Oct 23, 2023Updated 2 years ago
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Works…☆20May 29, 2024Updated last year
- [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks☆14May 2, 2025Updated 10 months ago
- ☆18Feb 25, 2026Updated last week
- ☆19Mar 5, 2024Updated 2 years ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- Quantum Fast Approximate Synthesis Tool☆19Jan 23, 2023Updated 3 years ago
- BCQ tutorial for transformers☆17Jul 17, 2023Updated 2 years ago
- Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"☆27Jun 4, 2024Updated last year
- ☆23Jun 13, 2024Updated last year
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆21Apr 2, 2024Updated last year
- Data for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder"☆20Oct 26, 2023Updated 2 years ago
- A quantum circuit optimizer based on sum-over-paths representations☆26Nov 8, 2019Updated 6 years ago
- (NeurIPS '22) LISA: Learning Interpretable Skill Abstractions - A framework for unsupervised skill learning using Imitation☆29Feb 22, 2023Updated 3 years ago
- Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.☆32Nov 7, 2024Updated last year
- ☆24Jan 28, 2025Updated last year
- ☆79Nov 5, 2024Updated last year
- ☆23Apr 17, 2022Updated 3 years ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆27Nov 20, 2024Updated last year
- A Mechanistic Interpretability Analysis of Grokking☆27Sep 26, 2022Updated 3 years ago
- RewardAnything: Generalizable Principle-Following Reward Models☆45Jun 11, 2025Updated 8 months ago
- ☆27Oct 22, 2024Updated last year
- Algebraic value editing in pretrained language models☆69Nov 1, 2023Updated 2 years ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆131Mar 9, 2024Updated last year
- ☆28May 4, 2023Updated 2 years ago
- Optim4RL is a Jax framework of learning to optimize for reinforcement learning.☆28Nov 27, 2024Updated last year