TransformerLensOrg / CircuitsVisLinks

Mechanistic Interpretability Visualizations using React

☆272

Alternatives and similar repositories for CircuitsVis

Users that are interested in CircuitsVis are comparing it to the libraries listed below

Sorting:

ArthurConmy / Automatic-Circuit-Discovery
☆233Updated 10 months ago
ndif-team / nnsight
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆619Updated this week
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆207Updated 7 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆123Updated last year
saprmarks / dictionary_learning
☆320Updated 2 weeks ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆257Updated last year
EleutherAI / sparsify
Sparsify transformers with SAEs and transcoders
☆595Updated this week
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆219Updated last year
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆512Updated last year
jbloomAus / SAELens
Training Sparse Autoencoders on Language Models
☆895Updated this week
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆200Updated this week
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆261Updated last year
adamkarvonen / SAEBench
☆107Updated 2 weeks ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆100Updated last month
jacobdunefsky / transcoder_circuits
☆154Updated 8 months ago
neelnanda-io / Crosscoders
☆50Updated 8 months ago
saprmarks / feature-circuits
☆183Updated 2 weeks ago
collin-burns / discovering_latent_knowledge
☆274Updated last year
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆119Updated 5 months ago
redwoodresearch / Easy-Transformer
☆121Updated 11 months ago
ARBORproject / arborproject.github.io
☆81Updated 5 months ago
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆195Updated 8 months ago
alan-cooney / transformer-from-scratch
Decoder only transformer, built from scratch with PyTorch
☆30Updated last year
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆127Updated 2 years ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆231Updated 6 months ago
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆32Updated last week
METR / task-standard
METR Task Standard
☆154Updated 5 months ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆207Updated last week
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆167Updated last year
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆57Updated 9 months ago