apartresearch / readingwhatwecanLinks
πππππππππ Reading everything
β15Updated last month
Alternatives and similar repositories for readingwhatwecan
Users that are interested in readingwhatwecan are comparing it to the libraries listed below
Sorting:
- Keeping language models honest by directly eliciting knowledge encoded in their activations.β211Updated this week
- Mechanistic Interpretability for Transformer Modelsβ53Updated 3 years ago
- A dataset of alignment research and code to reproduce itβ78Updated 2 years ago
- β104Updated last week
- β305Updated last year
- β29Updated last year
- Erasing concepts from neural representations with provable guaranteesβ238Updated 9 months ago
- METR Task Standardβ163Updated 8 months ago
- Measuring the situational awareness of language modelsβ38Updated last year
- β65Updated 2 years ago
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizationsβ198Updated 3 years ago
- β22Updated 4 years ago
- β84Updated last year
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paperβ129Updated 3 years ago
- β100Updated last year
- β138Updated 3 months ago
- Tools for studying developmental interpretability in neural networks.β111Updated 4 months ago
- β130Updated 2 years ago
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]β72Updated last year
- β113Updated 2 weeks ago
- Mechanistic Interpretability Visualizations using Reactβ296Updated 10 months ago
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizationsβ14Updated last year
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)β20Updated 9 months ago
- β19Updated 2 years ago
- π§ Starter templates for doing interpretability researchβ74Updated 2 years ago
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inferenceβ¦β215Updated 4 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"β95Updated 2 years ago
- β60Updated last month
- The Happy Faces Benchmarkβ15Updated 2 years ago
- MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementingβ¦β10Updated last year