TransformerLensOrg / TransformerLens
A library for mechanistic interpretability of GPT-style language models
☆1,901Updated this week
Alternatives and similar repositories for TransformerLens:
Users that are interested in TransformerLens are comparing it to the libraries listed below
- Training Sparse Autoencoders on Language Models☆637Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆503Updated this week
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆706Updated last week
- ☆468Updated this week
- Representation Engineering: A Top-Down Approach to AI Transparency☆795Updated 6 months ago
- Sparsify transformers with SAEs and transcoders☆476Updated this week
- Mechanistic Interpretability Visualizations using React☆235Updated 2 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆477Updated 9 months ago
- This repository collects all relevant resources about interpretability in LLMs☆322Updated 4 months ago
- A bibliography and survey of the papers surrounding o1☆1,172Updated 3 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆775Updated last week
- ☆429Updated 7 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆808Updated this week
- Minimalistic large language model 3D-parallelism training☆1,630Updated this week
- What would you do with 1000 H100s...☆1,009Updated last year
- Locating and editing factual associations in GPT (NeurIPS 2022)☆603Updated 10 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆217Updated 7 months ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆207Updated last year
- utilities for decoding deep representations (like sentence embeddings) back to text☆769Updated last month
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆993Updated 6 months ago
- Recipes to scale inference-time compute of open models☆1,019Updated this week
- Using sparse coding to find distributed representations used by neural networks.☆217Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,238Updated this week
- A library for making RepE control vectors☆552Updated last month
- ☆207Updated 5 months ago
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆976Updated last month
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆183Updated 2 months ago
- Language model alignment-focused deep learning curriculum☆1,331Updated 6 months ago
- List of papers on hallucination detection in LLMs.☆785Updated last week
- ☆246Updated 2 weeks ago