redwoodresearch / remix_publicLinks

☆20

Alternatives and similar repositories for remix_public

Users that are interested in remix_public are comparing it to the libraries listed below

Sorting:

EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆239Updated 10 months ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆214Updated last week
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆302Updated 11 months ago
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆232Updated 3 months ago
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆53Updated 3 years ago
redwoodresearch / rust_circuit_public
☆65Updated 2 years ago
LRudL / evalugator
(Model-written) LLM evals library
☆18Updated 11 months ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆116Updated 5 months ago
redwoodresearch / mlab
Machine Learning for Alignment Bootcamp
☆81Updated 3 years ago
EleutherAI / project-menu
See the issue board for the current status of active and prospective projects!
☆65Updated 3 years ago
neelnanda-io / 1L-Sparse-Autoencoder
☆132Updated 2 years ago
METR / task-standard
METR Task Standard
☆168Updated 9 months ago
apartresearch / interpretability-starter
🧠 Starter templates for doing interpretability research
☆75Updated 2 years ago
ArthurConmy / Automatic-Circuit-Discovery
☆255Updated last year
callummcdougall / sae_visualizer
☆29Updated last year
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆72Updated 2 years ago
amack315 / unsupervised-steering-vectors
☆36Updated last year
alignedai / HappyFaces
The Happy Faces Benchmark
☆15Updated 2 years ago
collin-burns / discovering_latent_knowledge
☆283Updated last year
anthropics / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆198Updated 3 years ago
redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆14Updated 3 years ago
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆129Updated this week
apple / ml-np-rasp
☆19Updated last year
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆228Updated 11 months ago
Mech-Interp / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆14Updated last year
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆248Updated last year
thestephencasper / everything-you-need
we got you bro
☆36Updated last year
anthropics / evals
☆313Updated last year
Kiv / fancy_einsum
Einsum with einops style variable names
☆18Updated last year
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆549Updated 3 months ago