apartresearch / readingwhatwecanLinks
ššššššššš Reading everything
ā14Updated 2 months ago
Alternatives and similar repositories for readingwhatwecan
Users that are interested in readingwhatwecan are comparing it to the libraries listed below
Sorting:
- Keeping language models honest by directly eliciting knowledge encoded in their activations.ā207Updated 2 weeks ago
- ā21Updated 3 years ago
- ā28Updated last year
- A dataset of alignment research and code to reproduce itā77Updated 2 years ago
- Mechanistic Interpretability for Transformer Modelsā51Updated 3 years ago
- The Happy Faces Benchmarkā15Updated last year
- Measuring the situational awareness of language modelsā35Updated last year
- ā55Updated 9 months ago
- ā134Updated 7 months ago
- Redwood Research's transformer interpretability toolsā14Updated 3 years ago
- Utilities for the HuggingFace transformers libraryā68Updated 2 years ago
- ā19Updated 2 years ago
- Erasing concepts from neural representations with provable guaranteesā228Updated 5 months ago
- METR Task Standardā151Updated 4 months ago
- Emergent world representations: Exploring a sequence model trained on a synthetic taskā181Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"ā81Updated last year
- ā29Updated 10 months ago
- ā280Updated 11 months ago
- Pytorch implementation on OpenAI's Procgen ppo-baseline, built from scratch.ā14Updated last year
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizationsā191Updated 3 years ago
- ā19Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from eā¦ā27Updated last year
- (Model-written) LLM evals libraryā18Updated 6 months ago
- ā9Updated 10 months ago
- ā270Updated last year
- š§ Starter templates for doing interpretability researchā71Updated last year
- Tools for studying developmental interpretability in neural networks.ā95Updated this week
- ā19Updated 2 years ago
- we got you broā35Updated 10 months ago
- ā99Updated 4 months ago