nrimsky/LM-exp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nrimsky/LM-exp)

nrimsky / LM-exp

LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces

☆104

Alternatives and similar repositories for LM-exp

Users that are interested in LM-exp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nrimsky / CAA
View on GitHub
Steering Llama 2 with Contrastive Activation Addition
☆240May 23, 2024Updated 2 years ago
epfl-dlab / llm-latent-language
View on GitHub
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆87Mar 11, 2024Updated 2 years ago
neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆14Feb 13, 2023Updated 3 years ago
KoyenaPal / future-lens
View on GitHub
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
☆21Oct 24, 2025Updated 8 months ago
annahdo / implementing_activation_steering
View on GitHub
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆24Oct 18, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
YuejiangLIU / csl
View on GitHub
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
☆15Feb 26, 2024Updated 2 years ago
AlignmentResearch / tuned-lens
View on GitHub
Tools for understanding how transformer predictions are built layer-by-layer
☆604Aug 7, 2025Updated 11 months ago
ajyl / dpo_toxic
View on GitHub
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆90Mar 7, 2025Updated last year
shauli-ravfogel / adv-kernel-removal
View on GitHub
☆12Oct 23, 2022Updated 3 years ago
declare-lab / red-instruct
View on GitHub
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆111Mar 8, 2024Updated 2 years ago
uw-nsl / SafeDecoding
View on GitHub
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆154Jul 19, 2024Updated 2 years ago
DanielSc4 / Dynamic-Activation-Composition
View on GitHub
Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"
☆14Nov 22, 2024Updated last year
montemac / activation_additions
View on GitHub
Algebraic value editing in pretrained language models
☆71Nov 1, 2023Updated 2 years ago
slavachalnev / SAE-TS
View on GitHub
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆28Nov 20, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
kxcloud / gradient-routing
View on GitHub
☆11Dec 4, 2024Updated last year
msakarvadia / AttentionLens
View on GitHub
Interpretating the latent space representations of attention head outputs for LLMs
☆39Aug 13, 2024Updated last year
UlisseMini / ana
View on GitHub
The AI that helps you achieve your goals
☆11Feb 4, 2024Updated 2 years ago
TransformerLensOrg / CircuitsVis
View on GitHub
Mechanistic Interpretability Visualizations using React
☆358Apr 30, 2026Updated 2 months ago
aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
likenneth / honest_llama
View on GitHub
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
☆581Jan 28, 2025Updated last year
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
shengliu66 / ICV
View on GitHub
Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
☆201Feb 13, 2025Updated last year
ericwtodd / function_vectors
View on GitHub
Function Vectors in Large Language Models (ICLR 2024)
☆199Apr 30, 2026Updated 2 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
bilal-chughtai / rep-theory-mech-interp
View on GitHub
☆31May 4, 2023Updated 3 years ago
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆293Oct 1, 2024Updated last year
saprmarks / geometry-of-truth
View on GitHub
☆113Aug 8, 2024Updated last year
zyxnlp / ICL-Interpretation-Analysis-Resources
View on GitHub
Links to publications that focus on the interpretation and analysis of in-context learning
☆14Oct 17, 2024Updated last year
steering-vectors / steering-vectors
View on GitHub
Steering vectors for transformer language models in Pytorch / Huggingface
☆157Feb 21, 2025Updated last year
Teddy-Li / LLM-NLI-Analysis
View on GitHub
☆15Jul 8, 2023Updated 3 years ago
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆265Feb 27, 2026Updated 4 months ago
danielmamay / mlab
View on GitHub
Machine Learning for Alignment Bootcamp (MLAB).
☆34Jan 24, 2022Updated 4 years ago
jplhughes / dotfiles
View on GitHub
Easily deploy my zsh and tmux configuration on new machines. Includes local and remote aliases to improve workflow.
☆15Apr 23, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
dvruette / concept-guidance
View on GitHub
Code accompanying the paper "A Language Model's Guide Through Latent Space". It contains functionality for training and using concept vec…
☆21Feb 23, 2024Updated 2 years ago
andyzoujm / representation-engineering
View on GitHub
Representation Engineering: A Top-Down Approach to AI Transparency
☆1,012Aug 14, 2024Updated last year
laramohan / wikillm
View on GitHub
LLMs as Collaboratively Edited Knowledge Bases
☆52Feb 8, 2026Updated 5 months ago
callummcdougall / sae_visualizer
View on GitHub
☆31Apr 4, 2024Updated 2 years ago
redwoodresearch / mlab
View on GitHub
Machine Learning for Alignment Bootcamp
☆84Apr 27, 2022Updated 4 years ago
chrisliu298 / awesome-representation-engineering
View on GitHub
A resource repository for representation engineering in large language models
☆156Nov 14, 2024Updated last year