jbloomAus/DecisionTransformerInterpretability

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jbloomAus/DecisionTransformerInterpretability)

jbloomAus / DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks

☆90

Alternatives and similar repositories for DecisionTransformerInterpretability

Users that are interested in DecisionTransformerInterpretability are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

haraldger / DRL-DecisionTransformer
View on GitHub
Research project for Deep Reinforcement Learning using Decision Transformer
☆16May 12, 2023Updated 3 years ago
understanding-search / maze-transformer
View on GitHub
This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.
☆35Oct 28, 2025Updated 8 months ago
TomFrederik / unseal
View on GitHub
Mechanistic Interpretability for Transformer Models
☆53Jun 1, 2022Updated 4 years ago
redwoodresearch / remix_public
View on GitHub
☆20Feb 17, 2023Updated 3 years ago
EleutherAI / elk
View on GitHub
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆221Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
etaoxing / multigame-dt
View on GitHub
Implementation of Multi-Game Decision Transformers in PyTorch
☆49Feb 11, 2023Updated 3 years ago
luchris429 / discovered-policy-optimisation
View on GitHub
Code for Discovered Policy Optimisation (NeurIPS 2022)
☆12Jun 15, 2023Updated 3 years ago
nikhilbarhate99 / min-decision-transformer
View on GitHub
Minimal implementation of Decision Transformer: Reinforcement Learning via Sequence Modeling in PyTorch for mujoco control tasks in Open…
☆294Jun 10, 2022Updated 4 years ago
jannik-brinkmann / hugginglens
View on GitHub
TransformerLens + HuggingFace
☆11Nov 4, 2023Updated 2 years ago
smearle / autoverse
View on GitHub
Generative cellular automaton-like learning environments for RL.
☆20Jan 30, 2025Updated last year
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆267Feb 27, 2026Updated 4 months ago
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
taufeeque9 / codebook-features
View on GitHub
Sparse and discrete interpretability tool for neural networks
☆65Feb 12, 2024Updated 2 years ago
Farama-Foundation / Procgen-Staging
View on GitHub
Procgen2: A community maintained fork of procgen
☆12Aug 25, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MassDynamics / protein-inference
View on GitHub
A python package for protein inference in Mass Spectrometric data analysis.
☆10Jun 6, 2022Updated 4 years ago
gradient-ai / Graphcore-HuggingFace
View on GitHub
A new repo to demonstrate tutorials for using HuggingFace on Graphcore IPUs.
☆12May 3, 2023Updated 3 years ago
hcmlab / GANterfactual-RL
View on GitHub
Counterfactual explanations for Reinforcement Learning agents on Atari
☆12Apr 3, 2023Updated 3 years ago
callummcdougall / ARENA_2.0
View on GitHub
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆247Aug 11, 2025Updated 11 months ago
1a3orn / very-simple-moe
View on GitHub
Extremely simple MoE implementation, mostly based off Switch Transformer
☆13Feb 26, 2024Updated 2 years ago
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆293Oct 1, 2024Updated last year
ndif-team / nnsight
View on GitHub
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆998Updated this week
lasr-spelling / sae-spelling
View on GitHub
Code for the paper "A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders"
☆15Dec 28, 2025Updated 6 months ago
google-deepmind / cartesian-frames
View on GitHub
A formalisation of Cartesian Frames, a perspective on embedded agency, in the HOL theorem prover.
☆22Dec 20, 2021Updated 4 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
vedantpalit / Towards-Vision-Language-Mechanistic-Interpretability
View on GitHub
This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…
☆25Feb 16, 2026Updated 5 months ago
TextZip / go1-rl-kit
View on GitHub
Deployment kit for Unitree Go1 Edu
☆24Dec 14, 2024Updated last year
google-deepmind / diplomacy
View on GitHub
☆60Apr 22, 2024Updated 2 years ago
decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,484Updated this week
Farama-Foundation / CrowdPlay
View on GitHub
A web based platform for collecting human actions in reinforcement learning environments
☆31Sep 10, 2025Updated 10 months ago
Hritikbansal / jpo
View on GitHub
☆13Jul 2, 2025Updated last year
aisa-group / decomposing-eval-awareness
View on GitHub
Decomposing and measuring evaluation awareness in existing benchmarks and our proposed EvalAwareBench.
☆19Jun 1, 2026Updated last month
avillaflor / SPLT-transformer
View on GitHub
☆18Jul 10, 2022Updated 4 years ago
ethz-spylab / superhuman-ai-consistency
View on GitHub
☆30Jun 19, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
brownirl / lambda_discrepancy
View on GitHub
Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy
☆24Oct 28, 2024Updated last year
FangchenLiu / MaskDP_public
View on GitHub
Code for "Masked Autoencoding for Scalable and Generalizable Decision Making". NeurIPS 2022
☆47Mar 12, 2024Updated 2 years ago
mxu34 / prompt-dt
View on GitHub
Official code repository for Prompt-DT.
☆123Aug 3, 2022Updated 3 years ago
stanfordnlp / pyvene
View on GitHub
Stanford NLP Python library for understanding and improving PyTorch models via interventions
☆893Mar 6, 2026Updated 4 months ago
chenhongge / SA_DQN
View on GitHub
[NeurIPS 2020, Spotlight] State-Adversarial DQN (SA-DQN) for robust deep reinforcement learning
☆35Feb 22, 2021Updated 5 years ago
mishajw / repeng
View on GitHub
Experiments with representation engineering
☆14Feb 28, 2024Updated 2 years ago
morning9393 / Optimal-Baseline-for-Multi-agent-Policy-Gradients
View on GitHub
☆30Aug 20, 2021Updated 4 years ago