JoshEngels/SAE-Probes

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JoshEngels/SAE-Probes)

JoshEngels / SAE-Probes

Code for reproducing our paper "Are Sparse Autoencoders Useful? A Case Study in Sparse Probing"

☆33

Alternatives and similar repositories for SAE-Probes

Users that are interested in SAE-Probes are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JoshEngels / SAE-Dark-Matter
View on GitHub
Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"
☆23Feb 6, 2025Updated last year
matchten / LoRA-Models-for-SAEs
View on GitHub
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆17Mar 31, 2025Updated last year
ExplainableML / sae-for-vlm
View on GitHub
[NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
☆84Nov 27, 2025Updated 5 months ago
technion-cs-nlp / llm-arithmetic-heuristics
View on GitHub
☆25May 20, 2025Updated last year
llnl / DeltaUQ
View on GitHub
☆13Nov 30, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
JoshEngels / FLINNG
View on GitHub
A fast high dimensional near neighbor search algorithm based on group testing and locality sensitive hashing
☆23Dec 9, 2023Updated 2 years ago
OSU-NLP-Group / saev
View on GitHub
Sparse autoencoders for vision
☆61May 12, 2026Updated 2 weeks ago
chanind / linear-relational
View on GitHub
Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch
☆10Aug 7, 2024Updated last year
cvenhoff / steering-thinking-llms
View on GitHub
☆36Jul 9, 2025Updated 10 months ago
gouki510 / Topology_of_Reasoning
View on GitHub
☆42Jun 11, 2025Updated 11 months ago
leanprover-community / mathlib-changelog
View on GitHub
☆15Apr 1, 2026Updated last month
xmed-lab / ECBM
View on GitHub
ICLR 2024: Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations
☆23May 1, 2025Updated last year
XuchanBao / behavioral-self-awareness
View on GitHub
☆36Feb 20, 2025Updated last year
chanind / tensor-theorem-prover
View on GitHub
First-order logic theorem prover supporting unification with approximate vector similarity
☆14Mar 23, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
EleutherAI / pile_dedupe
View on GitHub
Pile Deduplication Code
☆18May 15, 2023Updated 3 years ago
ef-eng / react-native-swag-toggle
View on GitHub
A Swag Toggle for React Native and Expo Web
☆14Jul 18, 2023Updated 2 years ago
nishantsubramani / steering_vectors
View on GitHub
Steering Vector Repo from "Extracting Latent Steering Vectors from Pretrained Language Models" - ACL2022 Findings
☆11Mar 14, 2022Updated 4 years ago
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆256May 18, 2026Updated last week
dynamical-org / notebooks
View on GitHub
Example use of dynamical.org weather datasets
☆38Updated this week
leo-liuzy / CodeUpdateArena
View on GitHub
☆17Mar 20, 2025Updated last year
leonweber / spyrolog
View on GitHub
Prolog interpreter with support for weak unification. Fork of https://bitbucket.org/cfbolz/pyrolog/
☆15Jun 23, 2020Updated 5 years ago
steelsojka / eslint-import-alias
View on GitHub
ESLint rule for restricting imports to path aliases
☆20May 4, 2024Updated 2 years ago
Vilin97 / linear-algebra-done-right
View on GitHub
☆12Jun 30, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
monksealseal / weatherflow
View on GitHub
☆15Apr 29, 2026Updated 3 weeks ago
chanind / amr-logic-converter
View on GitHub
Convert Abstract Meaning Representation (AMR) into first-order logic
☆17Aug 7, 2024Updated last year
TransluceAI / jailbreaking-frontier-models
View on GitHub
☆26Sep 3, 2025Updated 8 months ago
curt-tigges / crosslayer-coding
View on GitHub
☆17Jul 9, 2025Updated 10 months ago
microsoft / iclr2019-learning-to-represent-edits
View on GitHub
Code for the ICLR 2019 paper "Learning to Represent Edits"
☆13Dec 8, 2022Updated 3 years ago
OSU-NLP-Group / AgentAttack
View on GitHub
☆22Oct 25, 2024Updated last year
microsoft / compositional-generalization-span-level-attention
View on GitHub
code for the NAACL 2021 paper Compositional Generalization for Neural Semantic Parsing via Span-level Supervised Attention by Microsoft S…
☆12Apr 21, 2023Updated 3 years ago
THU-KEG / COPEN
View on GitHub
The official code and dataset for EMNLP 2022 paper "COPEN: Probing Conceptual Knowledge in Pre-trained Language Models".
☆21Mar 9, 2023Updated 3 years ago
MohamedAghzal / llms-as-path-planners
View on GitHub
☆18Sep 16, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
dtch1997 / steering-bench
View on GitHub
Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"
☆21Dec 14, 2024Updated last year
decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,389Updated this week
noanabeshima / matryoshka-saes
View on GitHub
☆28Nov 28, 2024Updated last year
Jayfeather1024 / Backdoor-Enhanced-Alignment
View on GitHub
☆24Dec 8, 2024Updated last year
h9-tect / llama2-qlora-finetunined-Arabic
View on GitHub
☆10Jul 21, 2023Updated 2 years ago
ajyl / mech_int_othelloGPT
View on GitHub
☆10Nov 6, 2024Updated last year
salokr / Email-Event-Extraction
View on GitHub
☆11Sep 6, 2024Updated last year