EleutherAI/elk-generalization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/EleutherAI/elk-generalization)

EleutherAI / elk-generalization

Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from easy questions to hard

☆33

Alternatives and similar repositories for elk-generalization

Users that are interested in elk-generalization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ejnnr / cupbearer
View on GitHub
A library for mechanistic anomaly detection
☆22Jan 9, 2025Updated last year
Aaquib111 / edge-attribution-patching
View on GitHub
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆48May 31, 2024Updated 2 years ago
TeunvdWeij / sandbagging
View on GitHub
☆20Nov 15, 2024Updated last year
EleutherAI / deep-ignorance
View on GitHub
☆20Jan 7, 2026Updated 6 months ago
rgreenblatt / model_organism_public
View on GitHub
☆15Jun 17, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
alan-cooney / transformer-from-scratch
View on GitHub
Decoder only transformer, built from scratch with PyTorch
☆33Oct 22, 2023Updated 2 years ago
longtermrisk / openweights
View on GitHub
A python sdk for LLM finetuning and inference on runpod infrastructure
☆30May 12, 2026Updated 2 months ago
EleutherAI / features-across-time
View on GitHub
Understanding how features learned by neural networks evolve throughout training
☆41Oct 24, 2024Updated last year
annahdo / implementing_activation_steering
View on GitHub
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆24Oct 18, 2024Updated last year
rhubarbwu / linguistic-collapse
View on GitHub
Codebase for Linguistic Collapse: Neural Collapse in (Large) Language Models [NeurIPS 2024] [arXiv:2405.17767]
☆19Apr 14, 2025Updated last year
safety-research / safety-examples
View on GitHub
☆31Nov 11, 2025Updated 8 months ago
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆17Dec 15, 2024Updated last year
poking-agents / modular-public
View on GitHub
☆34Jun 4, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
saprmarks / dictionary_learning
View on GitHub
☆427Aug 21, 2025Updated 11 months ago
EleutherAI / elk
View on GitHub
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆221Jul 13, 2026Updated last week
jplhughes / dotfiles
View on GitHub
Easily deploy my zsh and tmux configuration on new machines. Includes local and remote aliases to improve workflow.
☆15Apr 23, 2026Updated 2 months ago
EleutherAI / concept-erasure
View on GitHub
Erasing concepts from neural representations with provable guarantees
☆258Jan 27, 2025Updated last year
explanare / ravel
View on GitHub
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆58Oct 30, 2025Updated 8 months ago
XinyuHua / dyploc-acl2021
View on GitHub
Official repository for "DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Opinion Text Generation"
☆10May 20, 2022Updated 4 years ago
simple-stories / simple_stories_train
View on GitHub
Trains small LMs. Designed for training on SimpleStories
☆14Sep 15, 2025Updated 10 months ago
pHaeusler / tinycatstories
View on GitHub
☆10Aug 14, 2023Updated 2 years ago
safety-research / SHADE-Arena
View on GitHub
☆26Jun 22, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
pHaeusler / tic_tac_transformer
View on GitHub
☆11Sep 26, 2023Updated 2 years ago
ai-safety-foundation / sparse_autoencoder
View on GitHub
Sparse Autoencoder for Mechanistic Interpretability
☆303Jul 20, 2024Updated 2 years ago
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
allfed / allfed-integrated-model
View on GitHub
Integrated model to calculate the effects of resilient foods in catastrophic events
☆11May 20, 2025Updated last year
jcmgray / einsum_bmm
View on GitHub
einsum via batch matrix multiply
☆15Nov 29, 2023Updated 2 years ago
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆265Feb 27, 2026Updated 4 months ago
davisrbr / conjectures-arxiv
View on GitHub
OpenConjecture, a dataset of mathematics conjectures pulled from papers published to the ArXiv
☆15Jul 12, 2026Updated last week
alan-cooney / transformer-lens-starter-template
View on GitHub
A quick way to get started with Transformer Lens
☆14Dec 13, 2023Updated 2 years ago
Blkalkin / Optimal-TestTime
View on GitHub
☆10Mar 24, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
LoryPack / LLM-LieDetector
View on GitHub
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆74Jun 19, 2024Updated 2 years ago
clevcode / reversal-curse
View on GitHub
Reversal Curse Experiment
☆15Sep 24, 2023Updated 2 years ago
EleutherAI / steering-llama3
View on GitHub
☆30Aug 2, 2024Updated last year
samuelarnesen / nyu-debate-modeling
View on GitHub
☆25Oct 4, 2024Updated last year
adamkarvonen / SAEBench
View on GitHub
☆177May 1, 2026Updated 2 months ago
wesg52 / llm-context-neurons
View on GitHub
Find context neurons in Pythia models.
☆13Jun 13, 2023Updated 3 years ago
yacineMTB / just-large-models
View on GitHub
Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.
☆44Sep 6, 2023Updated 2 years ago