TransformerLensOrg / TransformerLensLinks

A library for mechanistic interpretability of GPT-style language models

☆2,413

Alternatives and similar repositories for TransformerLens

Users that are interested in TransformerLens are comparing it to the libraries listed below

Sorting:

jbloomAus / SAELens
Training Sparse Autoencoders on Language Models
☆895Updated this week
callummcdougall / ARENA_3.0
☆634Updated this week
stanfordnlp / pyvene
Stanford NLP Python library for understanding and improving PyTorch models via interventions
☆783Updated last week
ndif-team / nnsight
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆619Updated this week
EleutherAI / sparsify
Sparsify transformers with SAEs and transcoders
☆595Updated this week
andyzoujm / representation-engineering
Representation Engineering: A Top-Down Approach to AI Transparency
☆852Updated 11 months ago
openai / sparse_autoencoder
☆503Updated last year
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆219Updated last year
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆272Updated 7 months ago
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆512Updated last year
saprmarks / dictionary_learning
☆320Updated 2 weeks ago
srush / awesome-o1
A bibliography and survey of the papers surrounding o1
☆1,207Updated 8 months ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆257Updated last year
stanfordnlp / pyreft
Stanford NLP Python library for Representation Finetuning (ReFT)
☆1,500Updated 5 months ago
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆366Updated 9 months ago
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆261Updated last year
open-thought / system-2-research
System 2 Reasoning Link Collection
☆848Updated 4 months ago
kmeng01 / rome
Locating and editing factual associations in GPT (NeurIPS 2022)
☆653Updated last year
EleutherAI / cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
☆809Updated this week
vec2text / vec2text
utilities for decoding deep representations (like sentence embeddings) back to text
☆912Updated 2 months ago
Prisma-Multimodal / ViT-Prisma
ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).
☆289Updated last week
ArthurConmy / Automatic-Circuit-Discovery
☆233Updated 10 months ago
jacobhilton / deep_learning_curriculum
Language model alignment-focused deep learning curriculum
☆1,433Updated 11 months ago
EleutherAI / pythia
The hub for EleutherAI's work on interpretability and learning dynamics
☆2,575Updated last month
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆207Updated 7 months ago
JShollaj / awesome-llm-interpretability
A curated list of Large Language Model (LLM) Interpretability resources.
☆1,385Updated last month
open-thought / reasoning-gym
procedural reasoning datasets
☆998Updated this week
srush / LLM-Training-Puzzles
What would you do with 1000 H100s...
☆1,079Updated last year
Dakingrai / awesome-mechanistic-interpretability-lm-papers
☆177Updated 8 months ago
openai / automated-interpretability
☆1,027Updated last year