rmovva / HypotheSAEsLinks
Hypothesizing interpretable relationships in text datasets using sparse autoencoders.
☆43Updated last week
Alternatives and similar repositories for HypotheSAEs
Users that are interested in HypotheSAEs are comparing it to the libraries listed below
Sorting:
- ☆108Updated 7 months ago
- Discovering Data-driven Hypotheses in the Wild☆111Updated 3 months ago
- ☆54Updated 3 months ago
- PAIR.withgoogle.com and friend's work on interpretability methods☆203Updated this week
- ☆249Updated 6 months ago
- This is the official repository for HypoGeniC (Hypothesis Generation in Context) and HypoRefine, which are automated, data-driven tools t…☆87Updated last week
- ☆49Updated 10 months ago
- ☆235Updated last month
- The Prism Alignment Project☆79Updated last year
- ☆71Updated 3 weeks ago
- A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)☆21Updated 3 months ago
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)☆47Updated 2 months ago
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆16Updated 2 months ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆47Updated 9 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆42Updated 6 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆54Updated 11 months ago
- Automated Qualitative Analysis of LLMs (ICLR 2025)☆46Updated 2 months ago
- PyTorch library for Active Fine-Tuning☆91Updated 2 weeks ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- ☆124Updated last week
- ☆115Updated last year
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆85Updated last year
- Official Python client library for the OpenReview API☆203Updated this week
- Extending Conformal Prediction to LLMs☆67Updated last year
- ☆24Updated 2 years ago
- SDLG is an efficient method to accurately estimate aleatoric semantic uncertainty in LLMs☆27Updated last year
- ☆28Updated 6 months ago
- ☆74Updated last year
- Papers about scientific hypothesis generation with large language models (LLMs).☆74Updated 3 months ago
- An attribution library for LLMs☆42Updated last year