Xmaster6y / lczerolensLinks

🔬 Interpretability for Leela Chess Zero networks.

☆16

Alternatives and similar repositories for lczerolens

Users that are interested in lczerolens are comparing it to the libraries listed below

Sorting:

TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆293Updated 10 months ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆109Updated 3 months ago
curt-tigges / probity
☆19Updated 6 months ago
apartresearch / interpretability-starter
🧠 Starter templates for doing interpretability research
☆75Updated 2 years ago
michaelhodel / re-arc
Reverse Engineering the Abstraction and Reasoning Corpus
☆307Updated 7 months ago
TransluceAI / docent
☆50Updated 3 weeks ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆31Updated 4 months ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆238Updated 8 months ago
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆49Updated last week
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆228Updated 2 months ago
michaelhodel / arc-dsl
Domain Specific Language for the Abstraction and Reasoning Corpus
☆300Updated last year
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆221Updated 10 months ago
jessicarumbelow / Backwards
☆84Updated last year
ARBORproject / arborproject.github.io
☆81Updated 7 months ago
jannik-brinkmann / hugginglens
TransformerLens + HuggingFace
☆11Updated last year
KempnerInstitute / chess-research
☆11Updated last year
LRudL / evalugator
(Model-written) LLM evals library
☆18Updated 10 months ago
Butanium / tiny-activation-dashboard
A tiny easily hackable implementation of a feature dashboard.
☆15Updated last month
ArthurConmy / Automatic-Circuit-Discovery
☆244Updated last year
ndif-team / nnsight
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆683Updated this week
METR / public-tasks
☆104Updated this week
neelnanda-io / 1L-Sparse-Autoencoder
☆128Updated last year
redwoodresearch / mlab
Machine Learning for Alignment Bootcamp
☆79Updated 3 years ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆208Updated last week
HumanCompatibleAI / leela-interp
Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"
☆26Updated last year
Mech-Interp / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆14Updated last year
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆532Updated 2 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆218Updated this week
alignedai / HappyFaces
The Happy Faces Benchmark
☆15Updated 2 years ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆129Updated 3 years ago