cvenhoff/steering-thinking-llms

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cvenhoff/steering-thinking-llms)

cvenhoff / steering-thinking-llms

☆38

Alternatives and similar repositories for steering-thinking-llms

Users that are interested in steering-thinking-llms are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tim-hua-01 / steering-eval-awareness-public
View on GitHub
☆17Mar 16, 2026Updated 4 months ago
interp-reasoning / thought-anchors
View on GitHub
⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.
☆137Oct 27, 2025Updated 8 months ago
psunlpgroup / FoVer
View on GitHub
This repository includes code and materials for the paper "Efficient PRM Training Data Synthesis via Formal Verification" (ACL 2026 Findi…
☆18Apr 7, 2026Updated 3 months ago
UKPLab / tmlr2026-manifold-analysis
View on GitHub
☆21Mar 3, 2026Updated 4 months ago
cvenhoff / thinking-llms-interp
View on GitHub
☆25Jul 8, 2026Updated 2 weeks ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
tim-lawson / mlsae
View on GitHub
Multi-Layer Sparse Autoencoders (ICLR 2025)
☆30Feb 6, 2026Updated 5 months ago
fjzzq2002 / WeightWatch
View on GitHub
Official Repository of Paper "Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs"
☆15Sep 25, 2025Updated 9 months ago
KihoPark / linear_rep_geometry
View on GitHub
Code for 'The Linear Representation Hypothesis and the Geometry of Large Language Models' (ICML 2024)
☆125Feb 11, 2025Updated last year
stanfordnlp / axbench
View on GitHub
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆210Mar 12, 2026Updated 4 months ago
yinzhangyue / EoT
View on GitHub
Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication
☆21Mar 21, 2024Updated 2 years ago
Nix07 / finetuning
View on GitHub
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆32Oct 27, 2025Updated 8 months ago
dreadnode / agent-lens
View on GitHub
Agent observability and replay tooling for AI safety & interpretability research.
☆109Jun 19, 2026Updated last month
jettjaniak / chainscope
View on GitHub
Repository for the "Chain-of-Thought Reasoning In The Wild Is Not Always Faithful" paper
☆35Mar 31, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
goodfire-ai / r1-interpretability
View on GitHub
Open source interpretability artefacts for R1.
☆183Apr 21, 2025Updated last year
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
☆64Apr 11, 2026Updated 3 months ago
shangshang-wang / Resa
View on GitHub
Resa: Transparent Reasoning Models via SAEs
☆50Sep 23, 2025Updated 9 months ago
AIRI-Institute / SAE-Reasoning
View on GitHub
☆99Mar 28, 2025Updated last year
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆266Updated this week
Harvard-CS-2881 / harvard-cs-2881-hw0
View on GitHub
harvard-cs-2881-classroom-hw0-c2881-hw0 created by GitHub Classroom
☆16Jul 26, 2025Updated 11 months ago
goodfire-ai / memorization_kfac
View on GitHub
☆28Nov 6, 2025Updated 8 months ago
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆58May 1, 2025Updated last year
Lucanyc / VISTA-Gym
View on GitHub
☆27Mar 17, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
rgreenblatt / model_organism_public
View on GitHub
☆15Jun 17, 2025Updated last year
bpwu1 / confidence-regulation-neurons
View on GitHub
Confidence Regulation Neurons in Language Models (NeurIPS 2024)
☆15Feb 1, 2025Updated last year
swei2001 / RouteSAEs
View on GitHub
☆15Jan 2, 2026Updated 6 months ago
corl-team / steering-reasoning
View on GitHub
Official implementation of "Steering LLM Reasoning Through Bias-Only Adaptation" and "Small Vectors, Big Effects: A Mechanistic Study of …
☆54Oct 7, 2025Updated 9 months ago
gentlyzhao / Hijacking
View on GitHub
Code for Chain-of-Thought Hijacking
☆28Nov 10, 2025Updated 8 months ago
jiahai-feng / binding-iclr
View on GitHub
☆19Mar 5, 2024Updated 2 years ago
tmlr-group / landscape-of-thoughts
View on GitHub
[ICLR 2026] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"
☆61May 21, 2026Updated 2 months ago
BatsResearch / crosslingual-test-time-scaling
View on GitHub
Crosslingual Reasoning through Test-Time Scaling
☆21May 13, 2025Updated last year
JoshEngels / SAE-Probes
View on GitHub
Code for reproducing our paper "Are Sparse Autoencoders Useful? A Case Study in Sparse Probing"
☆33Mar 31, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
UCSB-NLP-Chang / ThinkPrune
View on GitHub
☆46Sep 27, 2025Updated 9 months ago
Zanette-Labs / efficient-reasoning
View on GitHub
☆75Apr 13, 2025Updated last year
VisualSphinx / VisualSphinx
View on GitHub
☆17Jun 3, 2025Updated last year
nickjiang2378 / interp-embed
View on GitHub
A toolkit for embedding text datasets with sparse autoencoders
☆30Mar 24, 2026Updated 3 months ago
g-luo / vlm_cross_modal_reps
View on GitHub
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆34May 1, 2025Updated last year
TencentARC / TaCA
View on GitHub
Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
☆16Jun 20, 2023Updated 3 years ago
steering-vectors / steering-vectors
View on GitHub
Steering vectors for transformer language models in Pytorch / Huggingface
☆157Feb 21, 2025Updated last year