Interpreting how transformers simulate agents performing RL tasks
☆90Oct 23, 2023Updated 2 years ago
Alternatives and similar repositories for DecisionTransformerInterpretability
Users that are interested in DecisionTransformerInterpretability are comparing it to the libraries listed below
Sorting:
- Research project for Deep Reinforcement Learning using Decision Transformer☆16May 12, 2023Updated 2 years ago
- Mechanistic Interpretability for Transformer Models☆53Jun 1, 2022Updated 3 years ago
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆34Oct 28, 2025Updated 4 months ago
- Implementation of Multi-Game Decision Transformers in PyTorch☆49Feb 11, 2023Updated 3 years ago
- ☆20Feb 17, 2023Updated 3 years ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆217Feb 23, 2026Updated last week
- ☆29Apr 30, 2024Updated last year
- Minimal implementation of Decision Transformer: Reinforcement Learning via Sequence Modeling in PyTorch for mujoco control tasks in Open…☆288Jun 10, 2022Updated 3 years ago
- Generative cellular automaton-like learning environments for RL.☆20Jan 30, 2025Updated last year
- ☆18Jul 10, 2022Updated 3 years ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆240Aug 11, 2025Updated 6 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆247Updated this week
- Counterfactual explanations for Reinforcement Learning agents on Atari☆12Apr 3, 2023Updated 2 years ago
- ☆11Apr 6, 2024Updated last year
- A python package for protein inference in Mass Spectrometric data analysis.☆10Jun 6, 2022Updated 3 years ago
- Workshop that will take you from Graph Neural Networks (GNNs) to Transformers, architectures which have led to numerous breakthrough achi…☆13Sep 11, 2023Updated 2 years ago
- ☆12Dec 20, 2024Updated last year
- TransformerLens + HuggingFace☆11Nov 4, 2023Updated 2 years ago
- Implementation of ICML 2023 paper: Future-conditioned Unsupervised Pretraining for Decision Transformer☆29Jul 25, 2023Updated 2 years ago
- ☆209Oct 14, 2025Updated 4 months ago
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆25Feb 16, 2026Updated 2 weeks ago
- Fluent dreaming for language models☆13Jul 22, 2024Updated last year
- ☆12Jul 12, 2024Updated last year
- Applies ROME and MEMIT on Mamba-S4 models☆14Apr 5, 2024Updated last year
- ☆18Oct 3, 2024Updated last year
- ☆10Jul 15, 2024Updated last year
- Official code repository for Prompt-DT.☆121Aug 3, 2022Updated 3 years ago
- A library for mechanistic interpretability of GPT-style language models☆3,133Updated this week
- AFEC: Active Forgetting of Negative Transfer in Continual Learning (NeurIPS 2021)☆28Sep 26, 2023Updated 2 years ago
- Procgen2: A community maintained fork of procgen☆12Aug 25, 2022Updated 3 years ago
- Code for Discovered Policy Optimisation (NeurIPS 2022)☆12Jun 15, 2023Updated 2 years ago
- Decision Transformer for offline single-agent autonomous highway driving☆28Jun 19, 2023Updated 2 years ago
- ☆13Feb 25, 2025Updated last year
- A web based platform for collecting human actions in reinforcement learning environments☆31Sep 10, 2025Updated 5 months ago
- ☆30Aug 20, 2021Updated 4 years ago
- A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation (ICLR2023)☆14Feb 3, 2023Updated 3 years ago
- Companion code to TRC paper: Daniel A. Lazar, Erdem Bıyık, Dorsa Sadigh, Ramtin Pedarsani. "Learning how to Dynamically Route Autonomous …☆16Aug 9, 2021Updated 4 years ago
- ☆14Dec 16, 2021Updated 4 years ago
- Repository for "Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics" …☆20Jun 16, 2024Updated last year