soniajoseph / ViT-Prisma

ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).

☆214

Alternatives and similar repositories for ViT-Prisma:

Users that are interested in ViT-Prisma are comparing it to the libraries listed below

saprmarks / dictionary_learning
☆255Updated last month
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆189Updated 3 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆121Updated last year
EleutherAI / sparsify
Sparsify transformers with SAEs and transcoders
☆494Updated this week
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆233Updated 8 months ago
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆224Updated last year
ArthurConmy / Automatic-Circuit-Discovery
☆211Updated 5 months ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆233Updated 3 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆163Updated this week
multimodal-interpretability / maia
Official implementation of MAIA, A Multimodal Automated Interpretability Agent
☆76Updated 2 weeks ago
ndif-team / nnsight
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆522Updated this week
KihoPark / linear_rep_geometry
☆89Updated last month
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆480Updated 9 months ago
openai / sparse_autoencoder
☆441Updated 8 months ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆226Updated last month
jbloomAus / SAELens
Training Sparse Autoencoders on Language Models
☆669Updated this week
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆186Updated 9 months ago
collin-burns / discovering_latent_knowledge
☆263Updated last year
adamkarvonen / SAEBench
☆71Updated this week
jacobdunefsky / transcoder_circuits
☆61Updated 4 months ago
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆327Updated 4 months ago
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆206Updated last year
saprmarks / feature-circuits
☆150Updated 2 weeks ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆117Updated 2 years ago
redwoodresearch / Easy-Transformer
☆113Updated 7 months ago
gortizji / tangent_task_arithmetic
Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".
☆97Updated last year
apartresearch / interpretability-starter
🧠 Starter templates for doing interpretability research
☆67Updated last year
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆90Updated last month
minyoungg / platonic-rep
☆507Updated 7 months ago
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆177Updated last year