☆20Jan 28, 2024Updated 2 years ago
Alternatives and similar repositories for TransformerLens-intro
Users that are interested in TransformerLens-intro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆14Feb 13, 2023Updated 3 years ago
- TransformerLens + HuggingFace☆11Nov 4, 2023Updated 2 years ago
- ☆107May 23, 2026Updated 3 weeks ago
- ☆22Jul 18, 2024Updated last year
- Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""☆20Oct 11, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆244Aug 11, 2025Updated 10 months ago
- Implementation of the paper "Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing".☆10Feb 6, 2024Updated 2 years ago
- Universal Neurons in GPT2 Language Models☆30May 28, 2024Updated 2 years ago
- ☆12Jan 9, 2024Updated 2 years ago
- ☆31Apr 4, 2024Updated 2 years ago
- ☆16Dec 18, 2023Updated 2 years ago
- ☆14Mar 4, 2024Updated 2 years ago
- ☆15Dec 19, 2022Updated 3 years ago
- Python package to accelerate research on generalized out-of-distribution (OOD) detection.☆15Jun 19, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code for CVPR 2024 Oral "Neural Lineage"☆17Jun 18, 2024Updated last year
- This repository contains the implementation of Concept Activation Regions, a new framework to explain deep neural networks with human con…☆17Oct 7, 2022Updated 3 years ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆261Feb 27, 2026Updated 3 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆220Jun 8, 2026Updated last week
- CNN-LSTM for intracranial hemorrhage detection☆40Oct 2, 2020Updated 5 years ago
- A library for mechanistic interpretability of GPT-style language models☆3,553Updated this week
- [CVPR 2024] Domain Gap Embeddings for Generative Dataset Augmentation☆22Jun 19, 2024Updated last year
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Works…☆20May 29, 2024Updated 2 years ago
- Yazi plugin to paste clipboard content to file.☆17Feb 16, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Using sparse coding to find distributed representations used by neural networks.☆305Nov 10, 2023Updated 2 years ago
- ☆24Nov 11, 2024Updated last year
- Captee is a macOS app that enables the sharing of links and content to Emacs Org Mode and Markdown-supporting tools.☆14May 19, 2025Updated last year
- PyTorch adversarial attack baselines for ImageNet, CIFAR10, and MNIST (state-of-the-art attacks comparison)☆20Mar 12, 2021Updated 5 years ago
- Certified robustness of deep neural networks☆19Aug 20, 2024Updated last year
- The code for the Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness paper☆23Nov 8, 2024Updated last year
- Run test on demand with support for many test runners☆10Aug 13, 2024Updated last year
- Robust Principles: Architectural Design Principles for Adversarially Robust CNNs☆24Jan 13, 2024Updated 2 years ago
- Demo repository showcasing how to use reusable workflows to build artifact attestations☆16Jun 8, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆1,131Updated this week
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆103Sep 21, 2023Updated 2 years ago
- Putting Visual Object Recognition in Context☆18Aug 3, 2021Updated 4 years ago
- Beautiful Personal Task Management webapp ( WIP )☆12Nov 23, 2023Updated 2 years ago
- Code for the paper: Dense Reward for Free in Reinforcement Learning from Human Feedback (ICML 2024) by Alex J. Chan, Hao Sun, Samuel Holt…☆38Aug 11, 2024Updated last year
- GPI-Space: Memory Driven Computing and Big Data☆10Mar 17, 2026Updated 2 months ago
- Config files for a shortcut system based on Karabiner (via Goku), Yabai, and Übersicht (via Nero.)☆11Jan 16, 2026Updated 5 months ago