callummcdougall/TransformerLens-intro

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/callummcdougall/TransformerLens-intro)

callummcdougall / TransformerLens-intro

☆20

Alternatives and similar repositories for TransformerLens-intro

Users that are interested in TransformerLens-intro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆14Feb 13, 2023Updated 3 years ago
acsresearch / interlab
View on GitHub
☆22Jul 18, 2024Updated 2 years ago
Jiaxin-Wen / MisleadLM
View on GitHub
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆20Oct 11, 2024Updated last year
jplhughes / dotfiles
View on GitHub
Easily deploy my zsh and tmux configuration on new machines. Includes local and remote aliases to improve workflow.
☆15Apr 23, 2026Updated 3 months ago
callummcdougall / path_patching
View on GitHub
Implementation of path patching & activation patching (will eventually add to TransformerLens).
☆15Jan 8, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
callummcdougall / ARENA_2.0
View on GitHub
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆247Aug 11, 2025Updated 11 months ago
jbloomAus / SAEDashboard
View on GitHub
☆109May 23, 2026Updated 2 months ago
rawsh / mirrorllm
View on GitHub
various experiments for scaling inference time compute with small reasoning models
☆17Jan 16, 2025Updated last year
wesg52 / universal-neurons
View on GitHub
Universal Neurons in GPT2 Language Models
☆30May 28, 2024Updated 2 years ago
MadryLab / pretraining-distribution-shift-robustness
View on GitHub
☆14Mar 4, 2024Updated 2 years ago
tmlr-group / DAL
View on GitHub
[NeurIPS 2023] "Learning to Augment Distributions for Out-of-distribution Detection"
☆11Nov 14, 2023Updated 2 years ago
zhiyugege / FreqBias
View on GitHub
☆16Dec 18, 2023Updated 2 years ago
Ytchen981 / CSA
View on GitHub
☆15Dec 19, 2022Updated 3 years ago
yu-rp / NeuralLineage
View on GitHub
Code for CVPR 2024 Oral "Neural Lineage"
☆17Jun 18, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mhadidg / abbr-cli
View on GitHub
CLI to look up abbreviations for terms
☆26Sep 24, 2021Updated 4 years ago
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆267Feb 27, 2026Updated 4 months ago
bbartoldson / Adversarial-Robustness-Limits
View on GitHub
ICML 2024 Paper "Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies"
☆18Jul 10, 2024Updated 2 years ago
paulbricman / semantica
View on GitHub
Extending conceptual thinking with semantic embeddings.
☆36Sep 21, 2021Updated 4 years ago
maxdreyer / PURE
View on GitHub
Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Works…
☆20May 29, 2024Updated 2 years ago
terence-bonhomme / rnyt
View on GitHub
take notes in RemNote with a YouTube video
☆23Mar 15, 2023Updated 3 years ago
HoagyC / sparse_coding
View on GitHub
Using sparse coding to find distributed representations used by neural networks.
☆307Nov 10, 2023Updated 2 years ago
poloclub / robust-principles
View on GitHub
Robust Principles: Architectural Design Principles for Adversarially Robust CNNs
☆24Jan 13, 2024Updated 2 years ago
safety-research / impossiblebench
View on GitHub
Official Inspect Implementation for "ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases"
☆48Dec 1, 2025Updated 7 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ndb796 / PyTorch-Adversarial-Attack-Baselines-for-ImageNet-CIFAR10-MNIST
View on GitHub
PyTorch adversarial attack baselines for ImageNet, CIFAR10, and MNIST (state-of-the-art attacks comparison)
☆20Mar 12, 2021Updated 5 years ago
jamesb93 / hammerspoon
View on GitHub
my hammerspoon config
☆11Jun 8, 2025Updated last year
stanislavfort / ensemble-everything-everywhere
View on GitHub
The code for the Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness paper
☆23Nov 8, 2024Updated last year
github / artifact-attestations-workflows
View on GitHub
Demo repository showcasing how to use reusable workflows to build artifact attestations
☆16Updated this week
sokcertifiedrobustness / sokcertifiedrobustness.github.io
View on GitHub
Keeps track of popular provable training and verification approaches towards robust neural networks, including leaderboards on popular da…
☆19Jun 12, 2024Updated 2 years ago
coldenate / zotero-remnote-connector
View on GitHub
A Citation Manager and Zotero Integration for RemNote! Cite research all within your knowledge base!
☆29Jan 22, 2026Updated 6 months ago
alexis- / BitShelter
View on GitHub
Snapshots & Backups for Windows
☆32Apr 22, 2024Updated 2 years ago
nrimsky / LM-exp
View on GitHub
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆105Sep 21, 2023Updated 2 years ago
xavihart / PDM-Pure
View on GitHub
PDM-based Purifier
☆23Nov 5, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
anthropics / toy-models-of-superposition
View on GitHub
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆156Sep 14, 2022Updated 3 years ago
gradient-ai / Graphcore-HuggingFace
View on GitHub
A new repo to demonstrate tutorials for using HuggingFace on Graphcore IPUs.
☆12May 3, 2023Updated 3 years ago
sapegin / sapegin.me
View on GitHub
My home page and blog
☆16Jul 18, 2026Updated last week
cc-hpc-itwm / gpispace
View on GitHub
GPI-Space: Memory Driven Computing and Big Data
☆10Mar 17, 2026Updated 4 months ago
recursal / minmodmon
View on GitHub
Mini Model Daemon
☆13Nov 9, 2024Updated last year
lucaorio / intentio
View on GitHub
Config files for a shortcut system based on Karabiner (via Goku), Yabai, and Übersicht (via Nero.)
☆11Jan 16, 2026Updated 6 months ago
samundra / Nepali-Keyboard-Layout
View on GitHub
MacOS का लागी पारम्परिक नेपाली किबोर्ड लेआऊट (Traditional Nepali Keyboard Layout for MacOS)
☆11Nov 14, 2024Updated last year