Engine for collecting, uploading, and downloading model activations
☆26Apr 2, 2025Updated 11 months ago
Alternatives and similar repositories for activault
Users that are interested in activault are comparing it to the libraries listed below
Sorting:
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 8 months ago
- Applying SAEs for fine-grained control☆25Dec 15, 2024Updated last year
- Open source replication of Anthropic's Crosscoders for Model Diffing☆64Oct 27, 2024Updated last year
- A library for training crosscoders☆16May 28, 2025Updated 9 months ago
- A library for efficient patching and automatic circuit discovery.☆90Dec 31, 2025Updated 2 months ago
- Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.☆16Nov 21, 2025Updated 3 months ago
- Unified access to Large Language Model modules using NNsight☆93Updated this week
- ☆17Jul 9, 2025Updated 7 months ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- ☆17Aug 30, 2025Updated 6 months ago
- ☆153Dec 30, 2025Updated 2 months ago
- ☆23Jun 30, 2025Updated 8 months ago
- ☆20Apr 10, 2025Updated 10 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- ☆83Feb 25, 2025Updated last year
- Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"☆19Jun 12, 2025Updated 8 months ago
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers☆21May 16, 2023Updated 2 years ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆243Feb 23, 2026Updated last week
- Mapping out the "memory" of neural nets with data attribution☆45Updated this week
- Modified to support crosscoder training.☆25Feb 4, 2026Updated last month
- Efficiently computing & storing token n-grams from large corpora☆26Oct 6, 2024Updated last year
- Open source interpretability artefacts for R1.☆172Apr 21, 2025Updated 10 months ago
- ☆30Dec 2, 2024Updated last year
- ☆70Mar 6, 2025Updated 11 months ago
- ☆48May 27, 2025Updated 9 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆168Feb 22, 2026Updated last week
- ☆396Aug 21, 2025Updated 6 months ago
- ☆209Oct 14, 2025Updated 4 months ago
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆31Apr 22, 2025Updated 10 months ago
- ☆29Jan 12, 2026Updated last month
- ☆13Oct 5, 2025Updated 4 months ago
- 🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models☆12May 30, 2025Updated 9 months ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.☆23Feb 16, 2026Updated 2 weeks ago
- A Blackjack game with GUI written in Java.☆11Nov 21, 2018Updated 7 years ago
- A framework for evaluating Machine Translation models.☆12May 26, 2025Updated 9 months ago
- Trains small LMs. Designed for training on SimpleStories☆12Sep 15, 2025Updated 5 months ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- Implementing LRP (Layer-wise Relevance Propagation) for a sequence-to-sequence model with GRU layers.☆12Sep 8, 2023Updated 2 years ago