TransformerLensOrg / TransformerLens
A library for mechanistic interpretability of GPT-style language models
☆2,007Updated this week
Alternatives and similar repositories for TransformerLens:
Users that are interested in TransformerLens are comparing it to the libraries listed below
- Training Sparse Autoencoders on Language Models☆695Updated last week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆536Updated this week
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆728Updated last week
- ☆490Updated this week
- Sparsify transformers with SAEs and transcoders☆499Updated this week
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,454Updated last month
- Representation Engineering: A Top-Down Approach to AI Transparency☆811Updated 7 months ago
- ☆448Updated 8 months ago
- Locating and editing factual associations in GPT (NeurIPS 2022)☆614Updated 11 months ago
- A bibliography and survey of the papers surrounding o1☆1,183Updated 4 months ago
- Mechanistic Interpretability Visualizations using React☆238Updated 3 months ago
- Language model alignment-focused deep learning curriculum☆1,353Updated 7 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆234Updated 8 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆481Updated 10 months ago
- This repository collects all relevant resources about interpretability in LLMs☆332Updated 5 months ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆207Updated last year
- ☆264Updated last month
- Using sparse coding to find distributed representations used by neural networks.☆226Updated last year
- What would you do with 1000 H100s...☆1,025Updated last year
- AllenAI's post-training codebase☆2,854Updated this week
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆783Updated last month
- Cramming the training of a (BERT-type) language model into limited compute.☆1,326Updated 9 months ago
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,432Updated 3 weeks ago
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆472Updated last year
- Reference implementation for DPO (Direct Preference Optimization)☆2,486Updated 7 months ago
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆516Updated 2 months ago
- ☆214Updated 6 months ago
- A library for making RepE control vectors☆562Updated 2 months ago
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆1,001Updated 7 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆822Updated last week