ndif-team / nnsightLinks

The nnsight package enables interpreting and manipulating the internals of deep learned models.

☆685

Alternatives and similar repositories for nnsight

Users that are interested in nnsight are comparing it to the libraries listed below

Sorting:

TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆296Updated 10 months ago
jbloomAus / SAELens
Training Sparse Autoencoders on Language Models
☆1,001Updated last week
EleutherAI / sparsify
Sparsify transformers with SAEs and transcoders
☆644Updated last week
saprmarks / dictionary_learning
☆355Updated 2 months ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆276Updated last year
ArthurConmy / Automatic-Circuit-Discovery
☆247Updated last year
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆224Updated 10 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆130Updated 2 years ago
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆280Updated last year
adamkarvonen / SAEBench
☆131Updated last week
stanfordnlp / pyvene
Stanford NLP Python library for understanding and improving PyTorch models via interventions
☆821Updated 2 weeks ago
openai / sparse_autoencoder
☆532Updated last year
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆229Updated 2 months ago
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆535Updated 2 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆219Updated last week
neelnanda-io / Crosscoders
☆54Updated 11 months ago
jacobdunefsky / transcoder_circuits
☆181Updated 11 months ago
saprmarks / feature-circuits
☆191Updated 2 weeks ago
callummcdougall / ARENA_3.0
☆755Updated last month
Prisma-Multimodal / ViT-Prisma
ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).
☆315Updated 3 months ago
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆375Updated 11 months ago
ARBORproject / arborproject.github.io
☆81Updated 8 months ago
redwoodresearch / Easy-Transformer
☆127Updated last year
Dakingrai / awesome-mechanistic-interpretability-lm-papers
☆206Updated 11 months ago
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆52Updated last week
alan-cooney / transformer-from-scratch
Decoder only transformer, built from scratch with PyTorch
☆31Updated 2 years ago
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆209Updated 11 months ago
TransluceAI / docent
☆53Updated last month
curt-tigges / probity
☆19Updated 6 months ago
collin-burns / discovering_latent_knowledge
☆279Updated last year