jettjaniak/chainscope

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jettjaniak/chainscope)

jettjaniak / chainscope

Repository for the "Chain-of-Thought Reasoning In The Wild Is Not Always Faithful" paper

☆35

Alternatives and similar repositories for chainscope

Users that are interested in chainscope are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

milesaturpin / cot-unfaithfulness
View on GitHub
☆57Oct 23, 2023Updated 2 years ago
raybears / cot-transparency
View on GitHub
Improving transparency of large language models' reasoning
☆15Nov 25, 2025Updated 8 months ago
dtch1997 / steering-bench
View on GitHub
Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"
☆22Dec 14, 2024Updated last year
redwoodresearch / alignment_faking_public
View on GitHub
☆95Oct 8, 2025Updated 9 months ago
interp-reasoning / thought-anchors
View on GitHub
⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.
☆137Oct 27, 2025Updated 9 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Jiaxin-Wen / MisleadLM
View on GitHub
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆20Oct 11, 2024Updated last year
TruthfulAI-research / negation_neglect
View on GitHub
Code for Negation Neglect
☆16May 22, 2026Updated 2 months ago
safety-research / open-source-alignment-faking
View on GitHub
Open Source Replication of Anthropic's Alignment Faking Paper
☆58Apr 4, 2025Updated last year
tim-hua-01 / steering-eval-awareness-public
View on GitHub
☆17Mar 16, 2026Updated 4 months ago
EleutherAI / attribute
View on GitHub
☆16Nov 14, 2025Updated 8 months ago
cvenhoff / steering-thinking-llms
View on GitHub
☆39Jul 9, 2025Updated last year
neelnanda-io / 1L-Sparse-Autoencoder
View on GitHub
☆141Oct 28, 2023Updated 2 years ago
facebookresearch / decrypto
View on GitHub
Implementation of the Decrypto benchmark for multi-agent reasoning and theory of mind.
☆22Jan 19, 2026Updated 6 months ago
redwoodresearch / Text-Steganography-Benchmark
View on GitHub
Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.
☆25Jan 26, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
aws / to-smote-or-not
View on GitHub
☆12Jun 17, 2024Updated 2 years ago
rgreenblatt / model_organism_public
View on GitHub
☆15Jun 17, 2025Updated last year
GeorgeVern / lmcor
View on GitHub
Code for the EACL 2024 paper: "Small Language Models Improve Giants by Rewriting Their Outputs"
☆12Apr 20, 2024Updated 2 years ago
nickkeesG / Pantheon
View on GitHub
Experimental LLM interface exploring new ways to use AI to improve human thinking
☆21Apr 13, 2026Updated 3 months ago
am-bean / lingOly
View on GitHub
A benchmark for language models based on the UK Linguistics Olympiad
☆12Mar 3, 2025Updated last year
decoderesearch / automated-interpretability
View on GitHub
☆24Feb 13, 2026Updated 5 months ago
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆58May 1, 2025Updated last year
lciernik / similarity_consistency
View on GitHub
Representational similarity consistency across dataset and their driving factors.
☆16Jun 20, 2025Updated last year
saprmarks / geometry-of-truth
View on GitHub
☆114Aug 8, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Heidelberg-NLP / CC-SHAP
View on GitHub
Code for "On Measuring Faithfulness of Natural Language Explanations"
☆23Jul 14, 2026Updated 2 weeks ago
rgreenblatt / control-evaluations
View on GitHub
☆25May 25, 2024Updated 2 years ago
CornellDataScience / FiggieBot
View on GitHub
Creating a game to play Figgie & Train an agent to play against
☆15Dec 3, 2022Updated 3 years ago
shobrook / pkld
View on GitHub
Persistent caching for Python functions
☆19Dec 10, 2025Updated 7 months ago
FarnoushRJ / RelP
View on GitHub
[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in La…
☆29Nov 3, 2025Updated 8 months ago
batu-el / molochs-bargain
View on GitHub
☆15May 7, 2026Updated 2 months ago
shobrook / syntaxis
View on GitHub
Analyze usage patterns of imported modules in a Python program
☆18Nov 20, 2024Updated last year
alexjfoote / Neuron2Graph
View on GitHub
Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
☆10Jun 6, 2023Updated 3 years ago
ApolloResearch / apd
View on GitHub
Attribution-based Parameter Decomposition
☆35Jun 11, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / ConstrainedReasoner
View on GitHub
☆13Aug 26, 2024Updated last year
alextamkin / active-learning-pretrained-models
View on GitHub
Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…
☆11Nov 22, 2022Updated 3 years ago
mariogrs / Simfast21
View on GitHub
21cm code
☆19Jul 17, 2020Updated 6 years ago
alexbhatt / epidm
View on GitHub
Epidemiological Data Management
☆13Dec 9, 2024Updated last year
shinington / facesec
View on GitHub
Corresponding code to "FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems" @ CVPR 2021
☆13Jun 22, 2021Updated 5 years ago
nishadsinghi / sc-genrm-scaling
View on GitHub
[COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…
☆15Oct 31, 2025Updated 8 months ago
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆17Oct 21, 2025Updated 9 months ago