apartresearch / readingwhatwecanLinks
πππππππππ Reading everything
β15Updated 2 months ago
Alternatives and similar repositories for readingwhatwecan
Users that are interested in readingwhatwecan are comparing it to the libraries listed below
Sorting:
- Keeping language models honest by directly eliciting knowledge encoded in their activations.β212Updated last week
- β22Updated 4 years ago
- β307Updated last year
- Mechanistic Interpretability for Transformer Modelsβ53Updated 3 years ago
- A dataset of alignment research and code to reproduce itβ78Updated 2 years ago
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizationsβ198Updated 3 years ago
- β281Updated last year
- β106Updated last week
- Erasing concepts from neural representations with provable guaranteesβ238Updated 9 months ago
- Emergent world representations: Exploring a sequence model trained on a synthetic taskβ191Updated 2 years ago
- The Happy Faces Benchmarkβ15Updated 2 years ago
- METR Task Standardβ167Updated 9 months ago
- β139Updated 3 months ago
- β84Updated last year
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inferenceβ¦β215Updated 5 months ago
- β100Updated last year
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paperβ130Updated 3 years ago
- Utilities for the HuggingFace transformers libraryβ71Updated 2 years ago
- Conversational chatbot to answer questions about AI Safety & Alignment based on information retrieved from the Alignment Research Datasetβ15Updated last month
- AI Safety Q&A web frontendβ41Updated this week
- β30Updated last year
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.β128Updated this week
- β65Updated 2 years ago
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]β72Updated last year
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizationsβ14Updated last year
- β17Updated last week
- β29Updated last year
- Mechanistic Interpretability Visualizations using Reactβ301Updated 11 months ago
- Tools for studying developmental interpretability in neural networks.β114Updated 4 months ago
- See the issue board for the current status of active and prospective projects!β65Updated 3 years ago