HoagyC / sparse_codingLinks

Using sparse coding to find distributed representations used by neural networks.

☆261

Alternatives and similar repositories for sparse_coding

Users that are interested in sparse_coding are comparing it to the libraries listed below

Sorting:

ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆257Updated last year
jacobdunefsky / transcoder_circuits
☆154Updated 8 months ago
saprmarks / dictionary_learning
☆320Updated 2 weeks ago
ArthurConmy / Automatic-Circuit-Discovery
☆233Updated 10 months ago
adamkarvonen / SAEBench
☆107Updated 2 weeks ago
saprmarks / feature-circuits
☆183Updated 2 weeks ago
openai / sparse_autoencoder
☆503Updated last year
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆200Updated this week
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆167Updated last year
neelnanda-io / 1L-Sparse-Autoencoder
☆123Updated last year
wesg52 / sparse-probing-paper
Sparse probing paper full code.
☆58Updated last year
davidbau / baukit
☆220Updated last year
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆207Updated 7 months ago
neelnanda-io / Crosscoders
☆50Updated 8 months ago
shehper / sparse-dictionary-learning
An Open Source Implementation of Anthropic's Paper: "Towards Monosemanticity: Decomposing Language Models with Dictionary Learning"
☆48Updated last year
Dakingrai / awesome-mechanistic-interpretability-lm-papers
☆177Updated 8 months ago
redwoodresearch / Easy-Transformer
☆121Updated 11 months ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆272Updated 7 months ago
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆366Updated 9 months ago
EleutherAI / sparsify
Sparsify transformers with SAEs and transcoders
☆595Updated this week
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆129Updated 8 months ago
OpenMOSS / Language-Model-SAEs
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
☆141Updated last week
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆119Updated 5 months ago
jbloomAus / SAELens
Training Sparse Autoencoders on Language Models
☆895Updated this week
ndif-team / nnsight
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆619Updated this week
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆57Updated 9 months ago
IBM / activation-steering
General-purpose activation steering library
☆85Updated 2 months ago
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆512Updated last year
nrimsky / LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆96Updated last year
andyrdt / refusal_direction
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
☆246Updated last month