acsresearch / interlabLinks
☆20Updated last year
Alternatives and similar repositories for interlab
Users that are interested in interlab are comparing it to the libraries listed below
Sorting:
- A dataset of alignment research and code to reproduce it☆77Updated 2 years ago
- General-Sum variant of the game Diplomacy for evaluating AIs.☆30Updated last year
- METR Task Standard☆160Updated 7 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆209Updated this week
- Tools for studying developmental interpretability in neural networks.☆103Updated 3 months ago
- Repo for the paper on Escalation Risks of AI systems☆44Updated last year
- Mechanistic Interpretability for Transformer Models☆51Updated 3 years ago
- ☆138Updated 2 months ago
- ☆300Updated last year
- ☆103Updated this week
- Machine Learning for Alignment Bootcamp☆78Updated 3 years ago
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…☆211Updated 3 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆111Updated this week
- Factored Cognition Primer: How to write compositional language model programs☆49Updated 2 years ago
- ☆97Updated last month
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆94Updated this week
- Interpreting how transformers simulate agents performing RL tasks☆88Updated last year
- ☆57Updated this week
- Conversational chatbot to answer questions about AI Safety & Alignment based on information retrieved from the Alignment Research Dataset☆15Updated this week
- ☆26Updated 3 months ago
- Command-line recursive question-answering with immutable contexts and explicit data store☆26Updated 7 years ago
- Mechanistic Interpretability Visualizations using React☆289Updated 9 months ago
- Inference API for many LLMs and other useful tools for empirical research☆71Updated last week
- ☆20Updated 3 months ago
- ☆108Updated 7 months ago
- Experiments with representation engineering☆12Updated last year
- Draw more samples☆193Updated last year
- Redwood Research's transformer interpretability tools☆14Updated 3 years ago
- Tools for exploring Transformer neuron behaviour, including input pruning and diversification.☆20Updated last year
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]☆73Updated last year