interp-reasoning/thought-anchors

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/interp-reasoning/thought-anchors)

interp-reasoning / thought-anchors

⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.

☆137

Alternatives and similar repositories for thought-anchors

Users that are interested in thought-anchors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

interp-reasoning / thought-anchors.com
View on GitHub
⚓️ Interactive playground for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.
☆18Dec 20, 2025Updated 7 months ago
ajobi-uhc / seer
View on GitHub
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …
☆146Feb 8, 2026Updated 5 months ago
clarifying-EM / model-organisms-for-EM
View on GitHub
Code repo for the model organisms and convergent directions of EM papers.
☆72Sep 22, 2025Updated 10 months ago
cvenhoff / steering-thinking-llms
View on GitHub
☆38Jul 9, 2025Updated last year
safety-research / false-facts
View on GitHub
☆50Jul 4, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
science-of-finetuning / diffing-toolkit
View on GitHub
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
☆78Updated this week
goodfire-ai / scribe
View on GitHub
☆85Feb 18, 2026Updated 5 months ago
jettjaniak / chainscope
View on GitHub
Repository for the "Chain-of-Thought Reasoning In The Wild Is Not Always Faithful" paper
☆35Mar 31, 2026Updated 3 months ago
ndif-team / nnterp
View on GitHub
Unified access to Large Language Model modules using NNsight
☆116Jul 2, 2026Updated 2 weeks ago
decoderesearch / automated-interpretability
View on GitHub
☆24Feb 13, 2026Updated 5 months ago
EleutherAI / attribute
View on GitHub
☆16Nov 14, 2025Updated 8 months ago
Centrattic / global-cot-analysis
View on GitHub
Global CoT Analysis: Initial attempts to uncover patterns across many chains of thought
☆20Feb 10, 2026Updated 5 months ago
TruthfulAI-research / negation_neglect
View on GitHub
Code for Negation Neglect
☆16May 22, 2026Updated 2 months ago
Aries-iai / Manifold_Steering
View on GitHub
The official implementation for "Mitigating Overthinking in Large Reasoning Models via Manifold Steering"
☆15May 29, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
mega002 / llm-interp-tau
View on GitHub
Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University
☆331Feb 8, 2026Updated 5 months ago
cywinski / eliciting-secret-knowledge
View on GitHub
Code repository for "Eliciting Secret Knowledge from Language Models"
☆23Mar 30, 2026Updated 3 months ago
tim-hua-01 / steering-eval-awareness-public
View on GitHub
☆17Mar 16, 2026Updated 4 months ago
goodfire-ai / scribe-task-suite
View on GitHub
A suite of interpretability tasks to evaluate agents using Scribe for notebook access
☆18Oct 2, 2025Updated 9 months ago
ndif-team / nnsight
View on GitHub
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆995Updated this week
alexjfoote / Neuron2Graph
View on GitHub
Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
☆10Jun 6, 2023Updated 3 years ago
technion-cs-nlp / parametric-faithfulness
View on GitHub
☆23Aug 30, 2025Updated 10 months ago
goodfire-ai / param-decomp
View on GitHub
Parameter Decomposition
☆133Updated this week
cvenhoff / thinking-llms-interp
View on GitHub
☆25Jul 8, 2026Updated 2 weeks ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
safety-research / inoculation-prompting
View on GitHub
☆15Oct 13, 2025Updated 9 months ago
apartresearch / Integer_Addition
View on GitHub
✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks
☆19Aug 16, 2024Updated last year
aadityasingh / icl-dynamics
View on GitHub
☆26Feb 20, 2026Updated 5 months ago
TransluceAI / circuits
View on GitHub
ADAG: Transluce's MLP neuron-level circuit tracing library
☆34Apr 10, 2026Updated 3 months ago
goodfire-ai / causalab
View on GitHub
☆104Jul 15, 2026Updated last week
curt-tigges / crosslayer-coding
View on GitHub
☆18Jul 9, 2025Updated last year
TransformerLensOrg / TransformerLens
View on GitHub
A library for mechanistic interpretability of GPT-style language models
☆3,705Updated this week
dreadnode / agent-lens
View on GitHub
Agent observability and replay tooling for AI safety & interpretability research.
☆109Jun 19, 2026Updated last month
longtermrisk / openweights
View on GitHub
A python sdk for LLM finetuning and inference on runpod infrastructure
☆30May 12, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
cadentj / caft
View on GitHub
☆25Mar 30, 2026Updated 3 months ago
uzaymacar / self-supervision
View on GitHub
Implementations of several self-supervised pretext tasks for language and vision modalities in PyTorch.
☆13Jan 19, 2021Updated 5 years ago
callummcdougall / ARENA_3.0
View on GitHub
☆1,185Updated this week
dtch1997 / steering-bench
View on GitHub
Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"
☆22Dec 14, 2024Updated last year
AngelaZZZ-611 / reasoning_models_probing
View on GitHub
☆21May 14, 2026Updated 2 months ago
safety-research / introspection-adapters
View on GitHub
Training LLMs to Report Their Learned Behaviors
☆27Apr 28, 2026Updated 2 months ago
adamkarvonen / activation_oracles
View on GitHub
☆95Apr 18, 2026Updated 3 months ago