delphi-suite / delphiLinks

small language models training made easy

☆13

Alternatives and similar repositories for delphi

Users that are interested in delphi are comparing it to the libraries listed below

Sorting:

slavachalnev / SAE-TS
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆21Updated 7 months ago
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆29Updated last year
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆25Updated 2 weeks ago
curt-tigges / probity
☆13Updated 2 months ago
jannik-brinkmann / hugginglens
TransformerLens + HuggingFace
☆11Updated last year
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆52Updated last month
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆55Updated 8 months ago
neelnanda-io / Crosscoders
☆44Updated 7 months ago
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆35Updated last year
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆67Updated 2 months ago
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆18Updated 5 months ago
understanding-search / maze-transformer
This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.
☆30Updated 9 months ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆95Updated this week
Butanium / nnterp
A small package implementing some useful wrapping around nnsight
☆13Updated this week
ArthurConmy / Automatic-Circuit-Discovery
☆227Updated 8 months ago
DeqingFu / transformers-icl-second-order
Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…
☆16Updated 7 months ago
noanabeshima / matryoshka-saes
☆17Updated 7 months ago
jbloomAus / SAEDashboard
☆57Updated last week
ApolloResearch / sample
Repository with sample code using Apollo's suggested engineering practices
☆9Updated 6 months ago
KihoPark / linear_rep_geometry
☆95Updated 4 months ago
callummcdougall / sae_visualizer
☆28Updated last year
aadityasingh / icl-dynamics
☆19Updated 2 months ago
redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆14Updated 3 years ago
redwoodresearch / alignment_faking_public
☆66Updated last month
safety-research / safety-tooling
Inference API for many LLMs and other useful tools for empirical research
☆49Updated last week
Jiaxin-Wen / MisleadLM
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆14Updated 8 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆108Updated 4 months ago
Cadenza-Labs / sleeper-agents
☆11Updated 11 months ago
bilal-chughtai / rep-theory-mech-interp
☆26Updated 2 years ago
callummcdougall / sae-exercises-mats
☆22Updated last year