ARBORproject / arborproject.github.ioLinks

☆83

Alternatives and similar repositories for arborproject.github.io

Users that are interested in arborproject.github.io are comparing it to the libraries listed below

Sorting:

UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆88Updated last month
neelnanda-io / Crosscoders
☆58Updated last year
saprmarks / feature-circuits
☆206Updated 3 months ago
jacobdunefsky / transcoder_circuits
☆197Updated last year
ArthurConmy / Automatic-Circuit-Discovery
☆267Updated last year
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆45Updated last year
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆320Updated last year
ndif-team / nnterp
Unified access to Large Language Model modules using NNsight
☆87Updated last week
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆241Updated last week
neelnanda-io / 1L-Sparse-Autoencoder
☆132Updated 2 years ago
KihoPark / linear_rep_geometry
☆115Updated 11 months ago
adamkarvonen / SAEBench
☆143Updated last month
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆238Updated last year
redwoodresearch / Easy-Transformer
☆138Updated last year
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆63Updated last year
evandez / relations
How do transformer LMs encode relations?
☆56Updated last year
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆56Updated 9 months ago
science-of-finetuning / crosscoder_learning
Modified to support crosscoder training.
☆25Updated 3 weeks ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆290Updated last year
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆293Updated 2 years ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆140Updated 11 months ago
saprmarks / dictionary_learning
☆389Updated 5 months ago
PAIR-code / pretraining-tda
☆32Updated 11 months ago
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆391Updated last year
saprmarks / geometry-of-truth
☆99Updated last year
wesg52 / sparse-probing-paper
Sparse probing paper full code.
☆66Updated 2 years ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆33Updated 7 months ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆57Updated 3 months ago
mlepori1 / NeuroSurgeon
NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers
☆42Updated 11 months ago
Dakingrai / awesome-mechanistic-interpretability-lm-papers
☆230Updated last year