vedantpalit / Towards-Vision-Language-Mechanistic-Interpretability
This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper accepted at the ICCV CLVL Workshop 2023
☆21Updated 11 months ago
Alternatives and similar repositories for Towards-Vision-Language-Mechanistic-Interpretability:
Users that are interested in Towards-Vision-Language-Mechanistic-Interpretability are comparing it to the libraries listed below
- ☆13Updated last year
- Self-Supervised Alignment with Mutual Information☆16Updated 10 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 10 months ago
- A library for efficient patching and automatic circuit discovery.☆59Updated last month
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆62Updated this week
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆67Updated 9 months ago
- ☆31Updated 2 months ago
- ☆34Updated last year
- ☆38Updated last year
- ☆15Updated 6 months ago
- Simple and scalable tools for data-driven pretraining data selection.☆18Updated last month
- ☆25Updated 7 months ago
- ☆20Updated 10 months ago
- ☆49Updated 7 months ago
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆11Updated 2 months ago
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆59Updated last year
- ☆82Updated 7 months ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆25Updated last year
- Tasks for describing differences between text distributions.☆16Updated 7 months ago
- ☆17Updated 5 months ago
- Augmenting Statistical Models with Natural Language Parameters☆23Updated 6 months ago
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆25Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆52Updated 4 months ago
- Learning adapter weights from task descriptions☆16Updated last year
- Evaluate the Quality of Critique☆34Updated 9 months ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆48Updated 10 months ago
- ☆29Updated 11 months ago
- [EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning☆47Updated 5 months ago
- Accompanying code for "Boosted Prompt Ensembles for Large Language Models"☆30Updated last year
- ☆20Updated last year