hannamw/eap-ig-faithfulness

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hannamw/eap-ig-faithfulness)

hannamw / eap-ig-faithfulness

Code for "Automatic Circuit Finding and Faithfulness"

☆18

Alternatives and similar repositories for eap-ig-faithfulness

Users that are interested in eap-ig-faithfulness are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hannamw / EAP-IG
View on GitHub
☆80May 23, 2026Updated last month
zjunlp / ModelKinship
View on GitHub
Exploring Model Kinship for Merging Large Language Models
☆28Apr 16, 2025Updated last year
kojima-takeshi188 / lang_neuron
View on GitHub
☆21Jun 24, 2024Updated 2 years ago
bilal-chughtai / rep-theory-mech-interp
View on GitHub
☆30May 4, 2023Updated 3 years ago
zepingyu0512 / in-context-mechanism
View on GitHub
code for EMNLP 2024 paper: How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for M…
☆13Nov 17, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zepingyu0512 / arithmetic-mechanism
View on GitHub
code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
☆12Nov 17, 2024Updated last year
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆293Oct 1, 2024Updated last year
shankarp8 / knowledge_distillation
View on GitHub
Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).
☆27Aug 25, 2024Updated last year
science-of-finetuning / diffing-toolkit
View on GitHub
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
☆75Jul 3, 2026Updated last week
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆17Oct 21, 2025Updated 8 months ago
tiagofrepereira2012 / gradients_without_backpropagation
View on GitHub
☆12Feb 23, 2022Updated 4 years ago
real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
CSOgroup / TAD-benchmarking-scripts
View on GitHub
Repository containing the scripts regarding analyses in Zufferey & Tavernari et al.
☆10Feb 22, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zjunlp / CaKE
View on GitHub
[EMNLP 2025] Circuit-Aware Editing Enables Generalizable Knowledge Learners
☆19Nov 17, 2025Updated 7 months ago
zjunlp / KnowledgeCircuits
View on GitHub
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
☆172Nov 14, 2025Updated 7 months ago
CHATS-lab / LLMs_Encode_Harmfulness_Refusal_Separately
View on GitHub
☆35Jul 3, 2026Updated last week
trestad / Factual-Recall-Mechanism
View on GitHub
The code for paper Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models.
☆13Apr 10, 2024Updated 2 years ago
kdu4108 / semiring-backprop-exps
View on GitHub
☆16Jul 10, 2023Updated 3 years ago
zjunlp / PitfallsKnowledgeEditing
View on GitHub
[ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models
☆22Jun 13, 2024Updated 2 years ago
science-of-finetuning / crosscoder_learning
View on GitHub
Modified to support crosscoder training.
☆27Jul 2, 2026Updated last week
ericwtodd / function_vectors
View on GitHub
Function Vectors in Large Language Models (ICLR 2024)
☆199Apr 30, 2026Updated 2 months ago
loftusa / owls
View on GitHub
Subliminal learning in LLMs: language models can transmit hidden preferences through seemingly unrelated training data.
☆24Nov 9, 2025Updated 8 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
zhangt2333 / SCMIS
View on GitHub
A Student-Course-Manage-Info-System. 一个学生选课管理信息系统。
☆11Feb 7, 2021Updated 5 years ago
adagorgun / awesome-generative-explainability
View on GitHub
A collection of research materials on explainable generative models
☆24Jun 30, 2026Updated last week
edenbiran / HoppingTooLate
View on GitHub
Exploring the Limitations of Large Language Models on Multi-Hop Queries
☆33Mar 2, 2025Updated last year
wuyike2000 / CoTKR
View on GitHub
☆32Jan 13, 2025Updated last year
ljhzxc / criteo_ctr_model_pytorch
View on GitHub
在criteo数据集上，用pytorch复现一些ctr模型
☆13Jun 20, 2020Updated 6 years ago
cooperleong00 / Awesome-LLM-Interpretability
View on GitHub
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆309Jan 22, 2026Updated 5 months ago
pkunlp-icler / IKE
View on GitHub
☆25Feb 27, 2023Updated 3 years ago
TeunvdWeij / sandbagging
View on GitHub
☆20Nov 15, 2024Updated last year
zepingyu0512 / awesome-LLM-neuron
View on GitHub
☆36Jun 13, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Thartvigsen / GRACE
View on GitHub
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
☆85Dec 21, 2024Updated last year
Super262 / MagicMirror
View on GitHub
AI-Powered Application of Make-up on Photos
☆21Aug 30, 2021Updated 4 years ago
hartvigsen-group / composable-interventions
View on GitHub
☆29Feb 27, 2025Updated last year
rhubarbwu / linguistic-collapse
View on GitHub
Codebase for Linguistic Collapse: Neural Collapse in (Large) Language Models [NeurIPS 2024] [arXiv:2405.17767]
☆19Apr 14, 2025Updated last year
ApolloResearch / deception-detection
View on GitHub
☆44Feb 11, 2025Updated last year
NLie2 / what_features_jailbreak_LLMs
View on GitHub
☆18Mar 30, 2025Updated last year
dvruette / concept-guidance
View on GitHub
Code accompanying the paper "A Language Model's Guide Through Latent Space". It contains functionality for training and using concept vec…
☆21Feb 23, 2024Updated 2 years ago