TransformerLensOrg / TransformerLensLinks
A library for mechanistic interpretability of GPT-style language models
☆2,274Updated this week
Alternatives and similar repositories for TransformerLens
Users that are interested in TransformerLens are comparing it to the libraries listed below
Sorting:
- Training Sparse Autoencoders on Language Models☆837Updated this week
- Sparsify transformers with SAEs and transcoders☆568Updated this week
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆756Updated 3 weeks ago
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆593Updated this week
- ☆586Updated this week
- Representation Engineering: A Top-Down Approach to AI Transparency☆836Updated 10 months ago
- ☆490Updated 11 months ago
- A bibliography and survey of the papers surrounding o1☆1,199Updated 7 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆250Updated 11 months ago
- Mechanistic Interpretability Visualizations using React☆257Updated 6 months ago
- ☆307Updated last month
- Locating and editing factual associations in GPT (NeurIPS 2022)☆641Updated last year
- This repository collects all relevant resources about interpretability in LLMs☆358Updated 7 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆500Updated last year
- ☆1,025Updated 6 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆857Updated 2 weeks ago
- System 2 Reasoning Link Collection☆838Updated 3 months ago
- A library for making RepE control vectors☆610Updated 5 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆202Updated 6 months ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆216Updated last year
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,490Updated 4 months ago
- Using sparse coding to find distributed representations used by neural networks.☆255Updated last year
- Minimalistic large language model 3D-parallelism training☆1,926Updated last week
- Recipes to scale inference-time compute of open models☆1,095Updated last month
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,542Updated last week
- Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'☆1,541Updated 4 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,629Updated this week
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆500Updated last year
- ☆4,088Updated last year
- A library for advanced large language model reasoning☆2,148Updated last week