rmovva / HypotheSAEsLinks
Hypothesizing interpretable relationships in text datasets using sparse autoencoders.
☆33Updated last week
Alternatives and similar repositories for HypotheSAEs
Users that are interested in HypotheSAEs are comparing it to the libraries listed below
Sorting:
- Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.☆28Updated 8 months ago
- ☆35Updated 2 years ago
- Closed-form polynomial approximations to neural networks☆13Updated 5 months ago
- ☆48Updated last month
- Understanding how features learned by neural networks evolve throughout training☆36Updated 8 months ago
- ☆29Updated last year
- Testing Language Models for Memorization of Tabular Datasets.☆34Updated 5 months ago
- ☆18Updated 2 years ago
- Minimum Description Length probing for neural network representations☆18Updated 5 months ago
- ☆23Updated 3 years ago
- Achieve error-rate fairness between societal groups for any score-based classifier.☆19Updated last year
- Quantification of Uncertainty with Adversarial Models☆30Updated 2 years ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- Sparse and discrete interpretability tool for neural networks☆63Updated last year
- ❓y0 (pronounced "why not?") is for causal inference in Python☆51Updated this week
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆18Updated last year
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆40Updated 4 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- ICLR dataset☆32Updated 3 weeks ago
- Sample, estimate, aggregate: A recipe for causal discovery foundation models☆11Updated last year
- A weak supervision framework for (partial) labeling functions☆16Updated last year
- Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!☆23Updated 3 months ago
- ☆101Updated 5 months ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆12Updated 6 months ago
- Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.☆42Updated 4 months ago
- MoodCat😼 classifies the mood of English sentences.☆14Updated 3 years ago
- ☆11Updated 5 months ago
- The Recognizing, Exploring, and Articulating Limitations in Machine Learning research tool (REAL ML) is a set of guided activities to hel…☆51Updated 3 years ago
- This is the code for the paper Jacobian-based Causal Discovery with Nonlinear ICA, demonstrating how identifiable representations (partic…☆18Updated 10 months ago
- CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction (arXiv 22)☆13Updated 3 years ago