hannamw / MIB-circuit-trackLinks
☆14Updated 3 weeks ago
Alternatives and similar repositories for MIB-circuit-track
Users that are interested in MIB-circuit-track are comparing it to the libraries listed below
Sorting:
- ☆35Updated last month
- Landing page for MIB: A Mechanistic Interpretability Benchmark☆12Updated last week
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆15Updated last year
- ☆10Updated 2 years ago
- ☆18Updated last year
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".☆33Updated 2 weeks ago
- ☆19Updated last year
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆18Updated 5 months ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"☆38Updated 2 years ago
- ☆107Updated 3 years ago
- Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"☆31Updated 2 years ago
- ☆35Updated 2 years ago
- In-context Example Selection with Influences☆15Updated 2 years ago
- ☆36Updated 2 years ago
- Code for paper "Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?"☆22Updated 4 years ago
- Code for preprint: Summarizing Differences between Text Distributions with Natural Language☆42Updated 2 years ago
- ☆24Updated 4 years ago
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 3 years ago
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆18Updated last year
- ☆34Updated last year
- ☆14Updated last year
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆11Updated 5 months ago
- Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)☆12Updated last year
- A library for efficient patching and automatic circuit discovery.☆67Updated 2 months ago
- ☆22Updated 3 years ago
- Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding☆18Updated 2 years ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆27Updated last year
- ☆44Updated 7 months ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆35Updated last year
- Teaching Models to Express Their Uncertainty in Words☆39Updated 3 years ago