FOR-sight-ai / interpretoLinks
πͺ Interpreto is an interpretability toolbox for LLMs
β95Updated 2 weeks ago
Alternatives and similar repositories for interpreto
Users that are interested in interpreto are comparing it to the libraries listed below
Sorting:
- π Overcomplete is a Vision-based SAE Toolboxβ112Updated last month
- Tools for optimizing steering vectors in LLMs.β17Updated 8 months ago
- Unified access to Large Language Model modules using NNsightβ71Updated last week
- β138Updated last week
- β83Updated 10 months ago
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformersβ42Updated 10 months ago
- Attribution-based Parameter Decompositionβ33Updated 6 months ago
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Worksβ¦β19Updated last year
- A library for efficient patching and automatic circuit discovery.β84Updated last week
- β200Updated 2 months ago
- Sparse Autoencoder for Mechanistic Interpretabilityβ285Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"β44Updated last year
- β112Updated 10 months ago
- Steering vectors for transformer language models in Pytorch / Huggingfaceβ137Updated 10 months ago
- Mechanistic Interpretability Visualizations using Reactβ306Updated last year
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β234Updated 2 weeks ago
- Open source replication of Anthropic's Crosscoders for Model Diffingβ63Updated last year
- A tiny easily hackable implementation of a feature dashboard.β15Updated 2 months ago
- Engine for collecting, uploading, and downloading model activationsβ24Updated 9 months ago
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok β¦β29Updated last month
- β83Updated 3 weeks ago
- Sparse Autoencoder Training Libraryβ56Updated 8 months ago
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.β49Updated this week
- β193Updated last year
- β58Updated last year
- β262Updated last year
- β20Updated 8 months ago
- β132Updated 2 years ago
- β36Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).β236Updated last year