TransformerLensOrg / TransformerLensLinks
A library for mechanistic interpretability of GPT-style language models
☆2,703Updated this week
Alternatives and similar repositories for TransformerLens
Users that are interested in TransformerLens are comparing it to the libraries listed below
Sorting:
- Training Sparse Autoencoders on Language Models☆1,015Updated this week
 - ☆764Updated last month
 - The nnsight package enables interpreting and manipulating the internals of deep learned models.☆692Updated this week
 - Sparsify transformers with SAEs and transcoders☆647Updated last week
 - Stanford NLP Python library for understanding and improving PyTorch models via interventions☆823Updated 3 weeks ago
 - Representation Engineering: A Top-Down Approach to AI Transparency☆903Updated last year
 - ☆537Updated last year
 - Mechanistic Interpretability Visualizations using React☆297Updated 10 months ago
 - A bibliography and survey of the papers surrounding o1☆1,208Updated 11 months ago
 - Sparse Autoencoder for Mechanistic Interpretability☆278Updated last year
 - Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆230Updated 2 months ago
 - ☆355Updated 2 months ago
 - Tools for understanding how transformer predictions are built layer-by-layer☆536Updated 2 months ago
 - Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆819Updated 3 months ago
 - Locating and editing factual associations in GPT (NeurIPS 2022)☆680Updated last year
 - utilities for decoding deep representations (like sentence embeddings) back to text☆961Updated 2 months ago
 - ☆248Updated last year
 - Stanford NLP Python library for Representation Finetuning (ReFT)☆1,521Updated 8 months ago
 - Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,044Updated this week
 - This repository collects all relevant resources about interpretability in LLMs☆377Updated last year
 - Using sparse coding to find distributed representations used by neural networks.☆281Updated last year
 - ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).☆319Updated 3 months ago
 - ☆1,052Updated last year
 - The hub for EleutherAI's work on interpretability and learning dynamics☆2,654Updated 4 months ago
 - System 2 Reasoning Link Collection☆857Updated 7 months ago
 - open source interpretability platform 🧠☆466Updated this week
 - Language model alignment-focused deep learning curriculum☆1,488Updated last year
 - Training Large Language Model to Reason in a Continuous Latent Space☆1,313Updated 2 months ago
 - Code for BLT research paper☆2,004Updated 5 months ago
 - Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆224Updated 10 months ago