likenneth / othello_worldLinks

Emergent world representations: Exploring a sequence model trained on a synthetic task

☆184

Alternatives and similar repositories for othello_world

Users that are interested in othello_world are comparing it to the libraries listed below

Sorting:

anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆127Updated 2 years ago
redwoodresearch / Easy-Transformer
☆121Updated 11 months ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆207Updated last week
ArthurConmy / Automatic-Circuit-Discovery
☆233Updated 10 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆123Updated last year
princeton-nlp / TransformerPrograms
[NeurIPS 2023] Learning Transformer Programs
☆162Updated last year
collin-burns / discovering_latent_knowledge
☆274Updated last year
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆207Updated 7 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆200Updated this week
GFNOrg / gfn-lm-tuning
☆183Updated last year
KihoPark / linear_rep_geometry
☆103Updated 5 months ago
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆512Updated last year
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆70Updated 2 years ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆272Updated 7 months ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆231Updated 6 months ago
Sea-Snell / grokking
unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
☆77Updated 3 years ago
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆51Updated 3 years ago
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆190Updated last year
mechanistic-interpretability-grokking / progress-measures-paper
☆68Updated 2 years ago
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆73Updated last week
saprmarks / geometry-of-truth
☆87Updated 11 months ago
aypan17 / machiavelli
☆137Updated last week
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆54Updated 3 months ago
callummcdougall / sae_visualizer
☆28Updated last year
Sea-Snell / Implicit-Language-Q-Learning
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆208Updated 2 years ago
CarperAI / autocrit
A repository for transformer critique learning and generation
☆90Updated last year
KihoPark / LLM_Categorical_Hierarchical_Representations
☆104Updated 5 months ago
gabegrand / world-models
☆208Updated 2 years ago
victorvikram / ConceptARC
Materials for ConceptARC paper
☆97Updated 8 months ago
google-research / cascades
Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…
☆207Updated last month