rmovva / HypotheSAEsLinks
Hypothesizing interpretable relationships in text datasets using sparse autoencoders.
☆31Updated 2 weeks ago
Alternatives and similar repositories for HypotheSAEs
Users that are interested in HypotheSAEs are comparing it to the libraries listed below
Sorting:
- ☆35Updated 2 years ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- Understanding how features learned by neural networks evolve throughout training☆36Updated 8 months ago
- Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.☆27Updated 7 months ago
- Achieve error-rate fairness between societal groups for any score-based classifier.☆19Updated last year
- Expertise modeling for the OpenReview matching system☆39Updated last week
- ☆99Updated 4 months ago
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆15Updated last year
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- MoodCat😼 classifies the mood of English sentences.☆14Updated 3 years ago
- Data Benchmarking☆21Updated last year
- Quantification of Uncertainty with Adversarial Models☆29Updated last year
- Attribution-based Parameter Decomposition☆25Updated 2 weeks ago
- ☆36Updated 2 years ago
- ☆29Updated last year
- Closed-form polynomial approximations to neural networks☆13Updated 4 months ago
- BenchBench is a Python package to evaluate multi-task benchmarks.☆15Updated 11 months ago
- Simple and scalable tools for data-driven pretraining data selection.☆24Updated 2 weeks ago
- A weak supervision framework for (partial) labeling functions☆16Updated 11 months ago
- ☆55Updated 7 months ago
- ☆22Updated last year
- ☆26Updated 2 years ago
- ☆23Updated 3 years ago
- ☆12Updated 10 months ago
- Highlight errors in a bib file: missing URLs, capitalization protection, etc☆27Updated last year
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆43Updated 6 months ago
- A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)☆16Updated last month
- Discovering Data-driven Hypotheses in the Wild☆94Updated 2 weeks ago
- Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost☆8Updated last year
- Minimum Description Length probing for neural network representations☆18Updated 5 months ago