TransformerLensOrg / TransformerLens
A library for mechanistic interpretability of GPT-style language models
☆1,797Updated this week
Alternatives and similar repositories for TransformerLens:
Users that are interested in TransformerLens are comparing it to the libraries listed below
- Training Sparse Autoencoders on Language Models☆599Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆469Updated this week
- ☆428Updated this week
- Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions☆688Updated this week
- ☆413Updated 6 months ago
- Mechanistic Interpretability Visualizations using React☆223Updated last month
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆209Updated 11 months ago
- Sparse autoencoders☆414Updated last week
- Representation Engineering: A Top-Down Approach to AI Transparency☆779Updated 5 months ago
- A bibliography and survey of the papers surrounding o1☆1,085Updated 2 months ago
- This repository collects all relevant resources about interpretability in LLMs☆309Updated 2 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆461Updated 7 months ago
- ☆220Updated 2 weeks ago
- What would you do with 1000 H100s...☆970Updated last year
- Sparse Autoencoder for Mechanistic Interpretability☆214Updated 6 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆791Updated this week
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,723Updated last month
- Minimalistic large language model 3D-parallelism training☆1,407Updated this week
- ☆202Updated 3 months ago
- Using sparse coding to find distributed representations used by neural networks.☆210Updated last year
- System 2 Reasoning Link Collection☆753Updated this week
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆758Updated last week
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆177Updated last month
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆990Updated 5 months ago
- Recipes to scale inference-time compute of open models☆975Updated 2 weeks ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,032Updated this week
- Locating and editing factual associations in GPT (NeurIPS 2022)☆600Updated 9 months ago
- NanoGPT (124M) in 3 minutes☆2,162Updated this week
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆460Updated 11 months ago
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆959Updated 2 months ago