Modified to support crosscoder training.
☆25Feb 4, 2026Updated last month
Alternatives and similar repositories for crosscoder_learning
Users that are interested in crosscoder_learning are comparing it to the libraries listed below
Sorting:
- Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.☆16Nov 21, 2025Updated 3 months ago
- ☆17Jul 9, 2025Updated 7 months ago
- Tools for optimizing steering vectors in LLMs.☆20Apr 10, 2025Updated 10 months ago
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆63Feb 26, 2026Updated last week
- EmotionCircuits-LLM: A complete, reproducible framework for discovering and controlling emotion circuits in large language models.☆25Oct 20, 2025Updated 4 months ago
- Trains Sparse Autoencoders based on outputs from language models☆11Oct 7, 2024Updated last year
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Mar 31, 2025Updated 11 months ago
- Engine for collecting, uploading, and downloading model activations☆26Apr 2, 2025Updated 11 months ago
- ACRE: Abstract Causal REasoning Beyond Covariation☆19Dec 7, 2021Updated 4 years ago
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)☆61Jul 24, 2025Updated 7 months ago
- ☆20Apr 10, 2025Updated 10 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- ☆27Nov 28, 2024Updated last year
- ☆209Oct 14, 2025Updated 4 months ago
- Implementation of "SALSA-CLRS: A Sparse and Scalable Benchmark for Algorithmic Reasoning". SALSA-CLRS is an extension to the original clr…☆22Nov 21, 2023Updated 2 years ago
- ☆24Aug 23, 2025Updated 6 months ago
- Unified access to Large Language Model modules using NNsight☆93Feb 28, 2026Updated last week
- ☆37Jul 4, 2025Updated 8 months ago
- An repository of 2025-2026 AI Safety and Alignment programs, camps, and workshops.☆21May 18, 2025Updated 9 months ago
- ☆24Jan 28, 2025Updated last year
- 100% Simplest reactive charts for the Vue.js☆12Jan 3, 2023Updated 3 years ago
- ☆48May 27, 2025Updated 9 months ago
- Sparsify transformers with SAEs and transcoders☆699Updated this week
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆75Jun 23, 2025Updated 8 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆247Feb 27, 2026Updated last week
- ☆134Oct 28, 2023Updated 2 years ago
- ☆83Feb 25, 2025Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆84Nov 27, 2024Updated last year
- ☆155Feb 16, 2026Updated 2 weeks ago
- Open source interpretability artefacts for R1.☆172Apr 21, 2025Updated 10 months ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- [ICLR 2024 Spotlight] Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Communi…☆11Mar 29, 2024Updated last year
- ☆10Mar 9, 2025Updated 11 months ago
- Create string diagrams with LaTeX!☆14Jan 3, 2025Updated last year
- Sample implementation accompanying the NeurIPS 2019 paper 'Powerset Convolutional Neural Networks' by Chris Wendler, Dan Alistarh, and Ma…☆10Oct 26, 2020Updated 5 years ago
- ☆42Jan 22, 2024Updated 2 years ago
- ppx_system is a syntax extension to known operating system at compile time☆12May 9, 2023Updated 2 years ago
- MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…☆10Oct 7, 2024Updated last year
- Code for "Multi-scale Abstract Reasoning" paper☆10Oct 17, 2022Updated 3 years ago