apartresearch / readingwhatwecan
πππππππππ Reading everything
β12Updated 7 months ago
Related projects β
Alternatives and complementary repositories for readingwhatwecan
- β24Updated 7 months ago
- Measuring the situational awareness of language modelsβ33Updated 9 months ago
- Mechanistic Interpretability for Transformer Modelsβ49Updated 2 years ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"β63Updated last year
- β44Updated last month
- Keeping language models honest by directly eliciting knowledge encoded in their activations.β186Updated last week
- π§ Starter templates for doing interpretability researchβ63Updated last year
- A dataset of alignment research and code to reproduce itβ69Updated last year
- β20Updated 3 years ago
- One stop shop for all things carpβ59Updated 2 years ago
- β42Updated 8 months ago
- we got you broβ33Updated 3 months ago
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizationsβ174Updated 2 years ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).β161Updated last month
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]β72Updated 3 months ago
- β74Updated 4 months ago
- β240Updated 4 months ago
- Erasing concepts from neural representations with provable guaranteesβ211Updated last week
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)β14Updated 7 months ago
- Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22β64Updated 2 years ago
- β61Updated last year
- β18Updated 10 months ago
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inferenceβ¦β197Updated 5 months ago
- β122Updated 3 weeks ago
- A domain-specific probabilistic programming language for modeling and inference with language modelsβ112Updated last year
- β45Updated last week
- β188Updated last month
- Mechanistic Interpretability Visualizations using Reactβ200Updated 4 months ago
- METR Task Standardβ127Updated 3 weeks ago
- AI Safety Q&A web frontendβ35Updated last week