thestephencasper / everything-you-needLinks

we got you bro

☆36

Alternatives and similar repositories for everything-you-need

Users that are interested in everything-you-need are comparing it to the libraries listed below

Sorting:

apartresearch / interpretability-starter
🧠 Starter templates for doing interpretability research
☆74Updated 2 years ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆117Updated 5 months ago
bilal-chughtai / rep-theory-mech-interp
☆27Updated 2 years ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
neelnanda-io / 1L-Sparse-Autoencoder
☆132Updated 2 years ago
callummcdougall / sae_visualizer
☆29Updated last year
redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆14Updated 3 years ago
annahdo / implementing_activation_steering
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆19Updated last year
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 7 months ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆32Updated 5 months ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆239Updated 10 months ago
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆231Updated 11 months ago
google-deepmind / mishax
☆144Updated 3 months ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆214Updated last week
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆302Updated 11 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆63Updated last year
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
safety-research / safety-tooling
Inference API for many LLMs and other useful tools for empirical research
☆83Updated 2 weeks ago
google-deepmind / dangerous-capability-evaluations
☆62Updated 2 months ago
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆53Updated 3 years ago
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆72Updated 2 years ago
collin-burns / discovering_latent_knowledge
☆283Updated last year
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆132Updated this week
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆130Updated 9 months ago
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆248Updated last year
tilde-research / sieve
Applying SAEs for fine-grained control
☆24Updated 11 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆164Updated 7 months ago
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆233Updated 3 months ago
redwoodresearch / mlab
Machine Learning for Alignment Bootcamp
☆81Updated 3 years ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆231Updated last week