Butanium / nnterpLinks

Unified access to Large Language Model modules using NNsight

☆70

Alternatives and similar repositories for nnterp

Users that are interested in nnterp are comparing it to the libraries listed below

Sorting:

saprmarks / feature-circuits
☆200Updated 2 months ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆306Updated last year
neelnanda-io / Crosscoders
☆58Updated last year
adamkarvonen / SAEBench
☆138Updated last week
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆204Updated last year
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆137Updated 10 months ago
ArthurConmy / Automatic-Circuit-Discovery
☆262Updated last year
ARBORproject / arborproject.github.io
☆83Updated 10 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆234Updated 2 weeks ago
neelnanda-io / 1L-Sparse-Autoencoder
☆132Updated 2 years ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆63Updated last year
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆236Updated last year
wesg52 / sparse-probing-paper
Sparse probing paper full code.
☆66Updated 2 years ago
jacobdunefsky / transcoder_circuits
☆193Updated last year
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆44Updated last year
curt-tigges / probity
☆20Updated 8 months ago
nrimsky / LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆100Updated 2 years ago
amack315 / unsupervised-steering-vectors
☆36Updated last year
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆84Updated this week
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆289Updated 2 years ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆285Updated last year
science-of-finetuning / diffing-toolkit
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
☆48Updated last week
ajobi-uhc / seer
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …
☆89Updated 2 weeks ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆57Updated 2 months ago
tilde-research / activault
Engine for collecting, uploading, and downloading model activations
☆24Updated 9 months ago
Butanium / tiny-activation-dashboard
A tiny easily hackable implementation of a feature dashboard.
☆15Updated 2 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆56Updated 8 months ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆33Updated 6 months ago
redwoodresearch / Easy-Transformer
☆135Updated last year
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆225Updated 3 weeks ago