FOR-sight-ai / interpretoLinks
πͺ Interpreto is an interpretability toolbox for LLMs
β141Updated this week
Alternatives and similar repositories for interpreto
Users that are interested in interpreto are comparing it to the libraries listed below
Sorting:
- Sparse Autoencoder for Mechanistic Interpretabilityβ290Updated last year
- Unified access to Large Language Model modules using NNsightβ87Updated last week
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β241Updated last week
- β83Updated 11 months ago
- Steering vectors for transformer language models in Pytorch / Huggingfaceβ140Updated 11 months ago
- Mechanistic Interpretability Visualizations using Reactβ320Updated last year
- β143Updated last month
- β206Updated 3 months ago
- This repository collects all relevant resources about interpretability in LLMsβ391Updated last year
- Attribution-based Parameter Decompositionβ33Updated 7 months ago
- β389Updated 5 months ago
- π Overcomplete is a Vision-based SAE Toolboxβ119Updated 2 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).β238Updated last year
- β267Updated last year
- β132Updated 2 years ago
- A toolkit for describing model features and intervening on those features to steer behavior.β228Updated last month
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]β219Updated 6 months ago
- β152Updated 5 months ago
- A library for efficient patching and automatic circuit discovery.β88Updated last month
- β58Updated last year
- Using sparse coding to find distributed representations used by neural networks.β293Updated 2 years ago
- ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).β337Updated 6 months ago
- Tools for optimizing steering vectors in LLMs.β19Updated 9 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffingβ63Updated last year
- PyTorch library for Active Fine-Tuningβ96Updated 4 months ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"β45Updated last year
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Worksβ¦β19Updated last year
- The nnsight package enables interpreting and manipulating the internals of deep learned models.β800Updated this week
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformersβ42Updated 11 months ago
- β197Updated last year