likenneth / othello_worldLinks
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆181Updated last year
Alternatives and similar repositories for othello_world
Users that are interested in othello_world are comparing it to the libraries listed below
Sorting:
- ☆116Updated 9 months ago
- ☆121Updated last year
- ☆222Updated 7 months ago
- [NeurIPS 2023] Learning Transformer Programs☆161Updated last year
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆125Updated 2 years ago
- ☆179Updated last year
- Mechanistic Interpretability for Transformer Models☆51Updated 2 years ago
- ☆66Updated 2 years ago
- ☆83Updated 9 months ago
- Sparse Autoencoder Training Library☆50Updated 3 weeks ago
- Mechanistic Interpretability Visualizations using React☆251Updated 5 months ago
- A library for efficient patching and automatic circuit discovery.☆65Updated last month
- ☆96Updated 3 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆180Updated this week
- Tools for studying developmental interpretability in neural networks.☆90Updated 4 months ago
- ☆93Updated 3 months ago
- ☆120Updated 6 months ago
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…☆207Updated 4 months ago
- ☆205Updated last year
- ☆131Updated 6 months ago
- Materials for ConceptARC paper☆94Updated 6 months ago
- Bootstrapping ARC☆123Updated 6 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆200Updated 5 months ago
- Inspecting and Editing Knowledge Representations in Language Models☆116Updated last year
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆204Updated this week
- Redwood Research's transformer interpretability tools☆15Updated 3 years ago
- ☆93Updated 10 months ago
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆78Updated 2 years ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆134Updated last year
- ☆83Updated 10 months ago