PAIR-code / pretraining-tdaLinks

☆22

Alternatives and similar repositories for pretraining-tda

Users that are interested in pretraining-tda are comparing it to the libraries listed below

Sorting:

logix-project / logix
AI Logging for Interpretability and Explainability🔬
☆125Updated last year
roeehendel / icl_task_vectors
☆96Updated last year
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆167Updated last year
adamkarvonen / SAEBench
☆109Updated 3 weeks ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆52Updated 10 months ago
saprmarks / geometry-of-truth
☆89Updated last year
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆73Updated 2 weeks ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆202Updated this week
TristanThrush / perplexity-correlations
Simple and scalable tools for data-driven pretraining data selection.
☆24Updated 2 months ago
MaheepChaudhary / SAE-Ravel
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆12Updated 6 months ago
angie-chen55 / pref-learning-ranking-acc
☆13Updated last year
y0mingzhang / diffuse-distributions
Forcing Diffuse Distributions out of Language Models
☆17Updated 10 months ago
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆175Updated 3 months ago
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆87Updated last week
mega002 / ff-layers
The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…
☆94Updated 3 years ago
wesg52 / sparse-probing-paper
Sparse probing paper full code.
☆58Updated last year
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆57Updated 9 months ago
epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆78Updated last year
KihoPark / linear_rep_geometry
☆103Updated 5 months ago
montemac / activation_additions
Algebraic value editing in pretrained language models
☆65Updated last year
milesaturpin / cot-unfaithfulness
☆47Updated last year
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆120Updated 5 months ago
OpenMOSS / Language-Model-SAEs
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
☆141Updated this week
redwoodresearch / Easy-Transformer
☆121Updated last year
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆54Updated 3 months ago
jacobdunefsky / transcoder_circuits
☆157Updated 8 months ago
hannamw / EAP-IG
☆47Updated 2 weeks ago
evandez / relations
How do transformer LMs encode relations?
☆52Updated last year
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆112Updated last month
neelnanda-io / 1L-Sparse-Autoencoder
☆124Updated last year