TransformerLensOrg / TransformerLens
A library for mechanistic interpretability of GPT-style language models
☆2,102Updated last week
Alternatives and similar repositories for TransformerLens:
Users that are interested in TransformerLens are comparing it to the libraries listed below
- Training Sparse Autoencoders on Language Models☆737Updated this week
- ☆519Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆548Updated this week
- ☆453Updated 9 months ago
- Sparsify transformers with SAEs and transcoders☆520Updated this week
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆734Updated last week
- Representation Engineering: A Top-Down Approach to AI Transparency☆819Updated 8 months ago
- A bibliography and survey of the papers surrounding o1☆1,190Updated 5 months ago
- Mechanistic Interpretability Visualizations using React☆241Updated 4 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆241Updated 9 months ago
- ☆274Updated 2 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,462Updated this week
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆210Updated last year
- Tools for understanding how transformer predictions are built layer-by-layer☆488Updated 10 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆786Updated last month
- Locating and editing factual associations in GPT (NeurIPS 2022)☆627Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆197Updated 4 months ago
- Recipes to scale inference-time compute of open models☆1,058Updated 2 months ago
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,463Updated last month
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,462Updated 2 months ago
- A library for making RepE control vectors☆580Updated 3 months ago
- Using sparse coding to find distributed representations used by neural networks.☆236Updated last year
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,304Updated 4 months ago
- Reference implementation for DPO (Direct Preference Optimization)☆2,542Updated 8 months ago
- This repository collects all relevant resources about interpretability in LLMs☆341Updated 5 months ago
- Minimalistic large language model 3D-parallelism training☆1,808Updated this week
- What would you do with 1000 H100s...☆1,038Updated last year
- ☆219Updated 6 months ago
- utilities for decoding deep representations (like sentence embeddings) back to text☆796Updated 2 weeks ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆833Updated this week