jessicarumbelow / Backwards
☆75Updated 6 months ago
Alternatives and similar repositories for Backwards:
Users that are interested in Backwards are comparing it to the libraries listed below
- Erasing concepts from neural representations with provable guarantees☆220Updated last month
- Extract full next-token probabilities via language model APIs☆230Updated 10 months ago
- Mechanistic Interpretability for Transformer Models☆49Updated 2 years ago
- Mechanistic Interpretability Visualizations using React☆220Updated last month
- See the issue board for the current status of active and prospective projects!☆65Updated 2 years ago
- ☆247Updated 6 months ago
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…☆200Updated last week
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆192Updated this week
- Utilities for the HuggingFace transformers library☆63Updated last year
- ☆114Updated last year
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆108Updated 2 years ago
- A dataset of alignment research and code to reproduce it☆73Updated last year
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]☆72Updated 5 months ago
- ☆258Updated 10 months ago
- An interactive exploration of Transformer programming.☆255Updated last year
- ☆58Updated 2 years ago
- ☆25Updated 9 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆78Updated last month
- ☆80Updated 3 months ago
- ☆135Updated this week
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆176Updated last month
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- METR Task Standard☆135Updated 2 weeks ago
- Our solution for the arc challenge 2024☆84Updated last month
- ☆79Updated last week
- ☆91Updated last year
- ☆125Updated 2 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆74Updated this week
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆171Updated last week
- ☆115Updated this week