hannamw/MIB-circuit-track

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hannamw/MIB-circuit-track)

hannamw / MIB-circuit-track

☆24

Alternatives and similar repositories for MIB-circuit-track

Users that are interested in MIB-circuit-track are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

goodfire-ai / causalab
View on GitHub
☆104Jul 15, 2026Updated 2 weeks ago
hannamw / EAP-IG
View on GitHub
☆84May 23, 2026Updated 2 months ago
Nix07 / finetuning
View on GitHub
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆32Oct 27, 2025Updated 9 months ago
ndif-team / workbench
View on GitHub
☆16Jul 22, 2026Updated last week
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆68Oct 27, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
callummcdougall / path_patching
View on GitHub
Implementation of path patching & activation patching (will eventually add to TransformerLens).
☆15Jan 8, 2024Updated 2 years ago
UFO-101 / auto-circuit
View on GitHub
A library for efficient patching and automatic circuit discovery.
☆99Dec 31, 2025Updated 6 months ago
PalisadeResearch / ctfish
View on GitHub
Chess agent specification gaming
☆25Updated this week
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated 2 years ago
EleutherAI / mdl
View on GitHub
Minimum Description Length probing for neural network representations
☆20Jan 28, 2025Updated last year
Heidelberg-NLP / CC-SHAP
View on GitHub
Code for "On Measuring Faithfulness of Natural Language Explanations"
☆23Jul 14, 2026Updated 2 weeks ago
keing1 / reward-hack-generalization
View on GitHub
Datasets used in the paper "Reward hacking behavior can generalize across tasks"
☆15Aug 17, 2025Updated 11 months ago
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆17Dec 15, 2024Updated last year
raybears / cot-transparency
View on GitHub
Improving transparency of large language models' reasoning
☆15Nov 25, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
successar / FRESH
View on GitHub
☆26Jun 12, 2023Updated 3 years ago
ericwtodd / function_vectors
View on GitHub
Function Vectors in Large Language Models (ICLR 2024)
☆199Apr 30, 2026Updated 2 months ago
sciai-lab / Truth_is_Universal
View on GitHub
☆34Nov 7, 2024Updated last year
FarnoushRJ / RelP
View on GitHub
[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in La…
☆29Nov 3, 2025Updated 8 months ago
VITA-Group / DnA
View on GitHub
[ECCV 2022] "Improve Few-Shot Transfer Learning with Low-Rank Decompose and Align" by Ziyu Jiang, Tianlong Chen, Xuxi Chen, Yu Cheng, Luo…
☆13Jul 19, 2022Updated 4 years ago
Jiaxin-Wen / MisleadLM
View on GitHub
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆20Oct 11, 2024Updated last year
batu-el / molochs-bargain
View on GitHub
☆15May 7, 2026Updated 2 months ago
openai / monitorability-evals
View on GitHub
Open-sourced evaluation suite from the Monitoring Monitorability paper
☆88Jun 11, 2026Updated last month
allenai / few_shot_explanations
View on GitHub
Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"
☆29Apr 28, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
aadityasingh / icl-dynamics
View on GitHub
☆26Feb 20, 2026Updated 5 months ago
anthropics / sycophancy-to-subterfuge-paper
View on GitHub
☆28Sep 5, 2024Updated last year
ndif-team / ndif
View on GitHub
The NDIF server, which performs deep inference and serves nnsight requests remotely
☆50Updated this week
noanabeshima / matryoshka-saes
View on GitHub
☆33Nov 28, 2024Updated last year
tim-hua-01 / steering-eval-awareness-public
View on GitHub
☆17Mar 16, 2026Updated 4 months ago
saprmarks / geometry-of-truth
View on GitHub
☆114Aug 8, 2024Updated last year
niconi19 / Emergent-Response-Planning-in-LLMs
View on GitHub
[ICML 2025] Emergent Response Planning in LLMs
☆20Jul 1, 2025Updated last year
curt-tigges / probity
View on GitHub
☆19Apr 10, 2025Updated last year
pratyushmaini / localizing-memorization
View on GitHub
Official Repository for ICML 2023 paper "Can Neural Network Memorization Be Localized?"
☆21Oct 26, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
technion-cs-nlp / parametric-faithfulness
View on GitHub
☆23Aug 30, 2025Updated 10 months ago
evan-lloyd / graphpatch
View on GitHub
graphpatch is a library for activation patching on PyTorch neural network models.
☆21Feb 11, 2025Updated last year
EleutherAI / attribute
View on GitHub
☆16Nov 14, 2025Updated 8 months ago
sleepinyourhat / quora-duplicate-questions-util
View on GitHub
Converts Quora's new NLU dataset to SNLI txt/jsonl format, plus test/dev split, tokenization.
☆14Jan 27, 2017Updated 9 years ago
lilt / tec
View on GitHub
Evaluation code and data for "Automatic Correction of Human Translations" [NAACL 2022].
☆19Dec 9, 2022Updated 3 years ago
technion-cs-nlp / llm-arithmetic-heuristics
View on GitHub
☆27Jun 9, 2026Updated last month
ndif-team / nnterp
View on GitHub
Unified access to Large Language Model modules using NNsight
☆116Updated this week