TransluceAI/introspective-interp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TransluceAI/introspective-interp)

TransluceAI / introspective-interp

Repository for "Training Language Models To Explain Their Own Computations"

☆23

Alternatives and similar repositories for introspective-interp

Users that are interested in introspective-interp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

edeyneka / pdf-reader-extension
View on GitHub
☆13Mar 9, 2025Updated last year
jacobdunefsky / llm-steering-opt
View on GitHub
Tools for optimizing steering vectors in LLMs.
☆22Apr 10, 2025Updated last year
UKPLab / tmlr2026-manifold-analysis
View on GitHub
☆21Mar 3, 2026Updated 4 months ago
OscarXZQ / delta_activations
View on GitHub
Official code release for Delta Activations: A Representation for Finetuned Large Language Models
☆20Sep 5, 2025Updated 10 months ago
HarmanDotpy / pairwise-self-verification
View on GitHub
[ICML 2026] Code for V1: Unifying Generation and Self-Verification for Parallel Reasoners.
☆39Mar 5, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
successar / FRESH
View on GitHub
☆26Jun 12, 2023Updated 3 years ago
watcl-lab / positional_attention
View on GitHub
Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"
☆14May 26, 2025Updated last year
oclivegriffin / crosscode
View on GitHub
A library for training crosscoders
☆17May 28, 2025Updated last year
idoatad / TensorLens
View on GitHub
Official PyTorch implementation for "TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors" [ACL 2026]
☆47Apr 14, 2026Updated 3 months ago
curt-tigges / crosslayer-coding
View on GitHub
☆18Jul 9, 2025Updated last year
peterbhase / ExplanationSearch
View on GitHub
Code for paper "Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals"
☆18Oct 17, 2022Updated 3 years ago
chanind / claude-auto-research-synthsaebench
View on GitHub
☆23Mar 11, 2026Updated 4 months ago
technion-cs-nlp / parametric-faithfulness
View on GitHub
☆23Aug 30, 2025Updated 10 months ago
GChrysostomou / ood_faith
View on GitHub
☆13Jul 26, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GBATZOLIS / BitstreamDiffusion
View on GitHub
☆15Updated this week
harish-kamath / rqae
View on GitHub
Residual Quantization Autoencoder, used for interpreting LLMs
☆14Jan 1, 2025Updated last year
McGill-NLP / latentlens
View on GitHub
Code and data for the paper "LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs"
☆48Mar 31, 2026Updated 3 months ago
juliensimon / ocel-generator
View on GitHub
Generate realistic multi-agent workflow traces with LLM-enriched content, semantic validation, and PM4Py compatibility. pip install open-…
☆16Apr 8, 2026Updated 3 months ago
ajobi-uhc / seer
View on GitHub
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …
☆146Feb 8, 2026Updated 5 months ago
maiush / OpenCharacterTraining
View on GitHub
Open Character Training
☆92Apr 4, 2026Updated 3 months ago
violetxi / ExpRL
View on GitHub
☆19Jun 16, 2026Updated last month
akozlo / AutoInterp
View on GitHub
An LLM agent framework for automated AI interpretability research
☆17Apr 17, 2026Updated 3 months ago
ArthurConmy / MishformerLens
View on GitHub
MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…
☆10Oct 7, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
emrecanacikgoz / Tool-R0
View on GitHub
☆35Apr 3, 2026Updated 3 months ago
safety-research / introspection-adapters
View on GitHub
Training LLMs to Report Their Learned Behaviors
☆27Apr 28, 2026Updated 2 months ago
LLM-Interp / CLT-Forge
View on GitHub
A Mechanistic Interpretability Toolkit for Cross-Layer Transcoder Training and Attribution-Graph Visualization
☆102Jul 10, 2026Updated last week
yuzhenmao / IceCache
View on GitHub
Implementation for IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs (ICLR 2026).
☆19Jun 9, 2026Updated last month
jacobkrantz / VertMetric
View on GitHub
VertMetric: An abstractive summarization evaluation package. VERT stands for Versatile Evaluation of Reduced Texts.
☆12Dec 20, 2018Updated 7 years ago
mega002 / llm-interp-tau
View on GitHub
Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University
☆330Feb 8, 2026Updated 5 months ago
lunary-ai / llm-benchmarks
View on GitHub
LLM benchmarks
☆13Feb 22, 2024Updated 2 years ago
yoavgur / PISCES
View on GitHub
🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models
☆13Jun 28, 2026Updated 3 weeks ago
jvladika / HealthFC
View on GitHub
HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking
☆14Apr 11, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Butanium / monte-carlo-tree-search-TSP
View on GitHub
Monte Carlo tree search for the travelling salesman problem (MCTS for the TSP)
☆12Jun 18, 2022Updated 4 years ago
HazyResearch / scaling-verification
View on GitHub
☆26Sep 4, 2025Updated 10 months ago
YibooZhao / cogvideox_vis_attention
View on GitHub
☆10Nov 18, 2024Updated last year
cardiffnlp / dialz
View on GitHub
The official repo for the Dialz Python library - a toolkit for steering vector research.
☆27Mar 26, 2026Updated 3 months ago
deemeetree / infranodus
View on GitHub
A Node.Js / Neo4J tool that translates words and relations into network graphs and shows you how it all connects.
☆13Oct 24, 2019Updated 6 years ago
ekinakyurek / influence
View on GitHub
Code for "Tracing Knowledge in Language Models Back to the Training Data"
☆40Dec 27, 2022Updated 3 years ago
xiye17 / EvalQAExpl
View on GitHub
Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.
☆17Apr 25, 2021Updated 5 years ago