curt-tigges/probity

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/curt-tigges/probity)

curt-tigges / probity

☆19

Alternatives and similar repositories for probity

Users that are interested in probity are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

curt-tigges / crosslayer-coding
View on GitHub
☆18Jul 9, 2025Updated last year
ArthurConmy / MishformerLens
View on GitHub
MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…
☆10Oct 7, 2024Updated last year
jammastergirish / LLMProbe
View on GitHub
☆20Dec 10, 2025Updated 7 months ago
safety-research / false-facts
View on GitHub
☆50Jul 4, 2025Updated last year
ndif-team / nnterp
View on GitHub
Unified access to Large Language Model modules using NNsight
☆116Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
science-of-finetuning / sparsity-artifacts-crosscoders
View on GitHub
Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.
☆17Jul 6, 2026Updated 2 weeks ago
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
jacobdunefsky / llm-steering-opt
View on GitHub
Tools for optimizing steering vectors in LLMs.
☆22Apr 10, 2025Updated last year
tim-hua-01 / steering-eval-awareness-public
View on GitHub
☆17Mar 16, 2026Updated 4 months ago
tim-lawson / mlsae
View on GitHub
Multi-Layer Sparse Autoencoders (ICLR 2025)
☆30Feb 6, 2026Updated 5 months ago
TransluceAI / introspective-interp
View on GitHub
Repository for "Training Language Models To Explain Their Own Computations"
☆23Jul 7, 2026Updated 2 weeks ago
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆58May 1, 2025Updated last year
goodfire-ai / scribe
View on GitHub
☆86Feb 18, 2026Updated 5 months ago
ApolloResearch / deception-detection
View on GitHub
☆44Feb 11, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
callummcdougall / path_patching
View on GitHub
Implementation of path patching & activation patching (will eventually add to TransformerLens).
☆15Jan 8, 2024Updated 2 years ago
TransluceAI / jailbreaking-frontier-models
View on GitHub
☆28Sep 3, 2025Updated 10 months ago
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
oclivegriffin / crosscode
View on GitHub
A library for training crosscoders
☆17May 28, 2025Updated last year
science-of-finetuning / crosscoder_learning
View on GitHub
Modified to support crosscoder training.
☆27Jul 2, 2026Updated 3 weeks ago
thejaminator / latteries
View on GitHub
James' cookbook of evaluations and finetuning experiments
☆32Feb 19, 2026Updated 5 months ago
adamkarvonen / dictionary_learning_demo
View on GitHub
☆26Aug 23, 2025Updated 11 months ago
science-of-finetuning / diffing-toolkit
View on GitHub
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
☆78Updated this week
JCocola / weird-generalization-and-inductive-backdoors
View on GitHub
Code and materials for "Weird Generalization and Inductive Backdoors"
☆41Jan 11, 2026Updated 6 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
edeyneka / pdf-reader-extension
View on GitHub
☆13Mar 9, 2025Updated last year
jbloomAus / SAEDashboard
View on GitHub
☆109May 23, 2026Updated 2 months ago
yash-srivastava19 / arrakis
View on GitHub
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆31Jul 8, 2026Updated 2 weeks ago
ejnnr / cupbearer
View on GitHub
A library for mechanistic anomaly detection
☆22Jan 9, 2025Updated last year
interp-reasoning / thought-anchors.com
View on GitHub
⚓️ Interactive playground for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.
☆18Dec 20, 2025Updated 7 months ago
Psi-Prod / ppx_system
View on GitHub
ppx_system is a syntax extension to known operating system at compile time
☆12May 9, 2023Updated 3 years ago
cloneofsimo / minSAE
View on GitHub
☆30Dec 2, 2024Updated last year
ApolloResearch / sample
View on GitHub
Repository with sample code using Apollo's suggested engineering practices
☆15Dec 16, 2024Updated last year
gautierdag / kblaunch
View on GitHub
CLI for fast launching jobs on a Kubernetes research cluster 🛸
☆15Jul 7, 2026Updated 2 weeks ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Butanium / monte-carlo-tree-search-TSP
View on GitHub
Monte Carlo tree search for the travelling salesman problem (MCTS for the TSP)
☆12Jun 18, 2022Updated 4 years ago
qiuhuachuan / latent-jailbreak
View on GitHub
☆39May 21, 2024Updated 2 years ago
TransluceAI / docent
View on GitHub
☆114Updated this week
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆68Oct 27, 2024Updated last year
dpaleka / llm-chess-proofgame
View on GitHub
LLMs playing chess are sensitive to how the position came to be
☆25Feb 14, 2024Updated 2 years ago
TransformerLensOrg / CircuitsVis
View on GitHub
Mechanistic Interpretability Visualizations using React
☆358Apr 30, 2026Updated 2 months ago
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago