apartresearch / readingwhatwecan
📚📚📚📚📚📚📚📚📚 Reading everything
☆13Updated 10 months ago
Alternatives and similar repositories for readingwhatwecan:
Users that are interested in readingwhatwecan are comparing it to the libraries listed below
- A dataset of alignment research and code to reproduce it☆74Updated last year
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]☆73Updated 7 months ago
- Mechanistic Interpretability for Transformer Models☆49Updated 2 years ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆197Updated last week
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆71Updated last year
- ☆19Updated last year
- An environment for learning formal mathematical reasoning from scratch☆62Updated 6 months ago
- ☆84Updated 2 weeks ago
- Erasing concepts from neural representations with provable guarantees☆223Updated last month
- 🧠 Starter templates for doing interpretability research☆67Updated last year
- Measuring the situational awareness of language models☆34Updated last year
- ☆33Updated last year
- ☆84Updated last month
- ☆26Updated 10 months ago
- ☆60Updated last month
- Factored Cognition Primer: How to write compositional language model programs☆48Updated 2 years ago
- ☆45Updated 11 months ago
- ☆129Updated 4 months ago
- One stop shop for all things carp☆59Updated 2 years ago
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- Interpreting how transformers simulate agents performing RL tasks☆77Updated last year
- 🦠 DeepDecipher: An open source API to MLP neurons☆9Updated 10 months ago
- ☆78Updated 8 months ago
- ☆90Updated 9 months ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆114Updated 2 years ago
- Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22☆64Updated 2 years ago
- ☆13Updated last year
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆18Updated last month
- ☆255Updated 8 months ago
- Evaluating the Moral Beliefs Encoded in LLMs☆24Updated 2 months ago