FOR-sight-ai / interpretoLinks
πͺ Interpreto is an interpretability toolbox for LLMs
β124Updated last week
Alternatives and similar repositories for interpreto
Users that are interested in interpreto are comparing it to the libraries listed below
Sorting:
- Unified access to Large Language Model modules using NNsightβ81Updated 2 weeks ago
- π Overcomplete is a Vision-based SAE Toolboxβ117Updated last month
- β142Updated last month
- Tools for optimizing steering vectors in LLMs.β18Updated 9 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β235Updated last week
- β83Updated 11 months ago
- Steering vectors for transformer language models in Pytorch / Huggingfaceβ138Updated 11 months ago
- β58Updated last year
- β387Updated 5 months ago
- Sparse Autoencoder for Mechanistic Interpretabilityβ289Updated last year
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Worksβ¦β19Updated last year
- Modified to support crosscoder training.β25Updated 2 weeks ago
- A library for efficient patching and automatic circuit discovery.β86Updated 3 weeks ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).β238Updated last year
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]β218Updated 6 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffingβ63Updated last year
- β229Updated last year
- Using sparse coding to find distributed representations used by neural networks.β293Updated 2 years ago
- Mechanistic Interpretability Visualizations using Reactβ315Updated last year
- β265Updated last year
- β57Updated last year
- This repository collects all relevant resources about interpretability in LLMsβ389Updated last year
- β114Updated 11 months ago
- β195Updated last year
- Attribution-based Parameter Decompositionβ33Updated 7 months ago
- β32Updated 11 months ago
- β204Updated 3 months ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"β45Updated last year
- Sparse Autoencoder Training Libraryβ56Updated 8 months ago
- A tiny easily hackable implementation of a feature dashboard.β15Updated 3 months ago