FOR-sight-ai / interpretoLinks
πͺ Interpreto is an interpretability toolbox for LLMs
β35Updated last week
Alternatives and similar repositories for interpreto
Users that are interested in interpreto are comparing it to the libraries listed below
Sorting:
- π Overcomplete is a Vision-based SAE Toolboxβ79Updated last month
- Sparse Autoencoder Training Libraryβ54Updated 4 months ago
- Attribution-based Parameter Decompositionβ30Updated 3 months ago
- Tools for optimizing steering vectors in LLMs.β11Updated 5 months ago
- β52Updated last year
- Universal Neurons in GPT2 Language Modelsβ30Updated last year
- Engine for collecting, uploading, and downloading model activationsβ22Updated 5 months ago
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Worksβ¦β19Updated last year
- Latent Program Network (from the "Searching Latent Program Spaces" paper)β96Updated 6 months ago
- β19Updated 2 years ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"β41Updated last year
- π Influenciae is a Tensorflow Toolbox for Influence Functionsβ64Updated last year
- A tiny easily hackable implementation of a feature dashboard.β13Updated 2 months ago
- Applying SAEs for fine-grained controlβ23Updated 9 months ago
- A TinyStories LM with SAEs and transcodersβ13Updated 5 months ago
- β28Updated 7 months ago
- A library for training crosscodersβ11Updated 3 months ago
- nanoGPT-like codebase for LLM trainingβ107Updated 4 months ago
- Flexible library for merging large language models (LLMs) via evolutionary optimization (ACL 2025 Demo).β85Updated last month
- supporting pytorch FSDP for optimizersβ84Updated 9 months ago
- β23Updated 9 months ago
- β12Updated 6 months ago
- Sparse and discrete interpretability tool for neural networksβ63Updated last year
- β119Updated 3 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β83Updated 10 months ago
- Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.β11Updated last month
- β34Updated last year
- β17Updated 5 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)β19Updated 8 months ago
- Materials for ConceptARC paperβ102Updated 10 months ago