ArthurConmy / MishformerLensLinks

MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing a custom, numerically inaccurate Transformer architecture.

☆10

Alternatives and similar repositories for MishformerLens

Users that are interested in MishformerLens are comparing it to the libraries listed below

Sorting:

Butanium / tiny-activation-dashboard
A tiny easily hackable implementation of a feature dashboard.
☆15Updated last month
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Updated 9 months ago
tilde-research / activault
Engine for collecting, uploading, and downloading model activations
☆24Updated 6 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 5 months ago
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆221Updated 10 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆218Updated this week
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆125Updated 7 months ago
amack315 / unsupervised-steering-vectors
☆36Updated last year
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆238Updated 8 months ago
oli-clive-griffin / crosscode
A library for training crosscoders
☆11Updated 4 months ago
curt-tigges / crosslayer-coding
☆15Updated 3 months ago
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆81Updated 10 months ago
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆77Updated 2 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆128Updated last year
jbloomAus / SAEDashboard
☆73Updated last week
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆31Updated 4 months ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆293Updated 10 months ago
KihoPark / LLM_Categorical_Hierarchical_Representations
☆109Updated 8 months ago
ARBORproject / arborproject.github.io
☆81Updated 7 months ago
evan-lloyd / graphpatch
graphpatch is a library for activation patching on PyTorch neural network models.
☆20Updated 8 months ago
evandez / relations
How do transformer LMs encode relations?
☆55Updated last year
curt-tigges / probity
☆19Updated 6 months ago
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆42Updated last year
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆59Updated 11 months ago
ArthurConmy / Automatic-Circuit-Discovery
☆244Updated last year
google-deepmind / mishax
☆142Updated last month
science-of-finetuning / crosscoder_learning
Modified to support crosscoder training.
☆23Updated last week
hannamw / EAP-IG
☆53Updated 2 months ago
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆247Updated last year
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆49Updated last week