jessicarumbelow / BackwardsLinks

☆85

Alternatives and similar repositories for Backwards

Users that are interested in Backwards are comparing it to the libraries listed below

Sorting:

EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆239Updated 10 months ago
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆248Updated last year
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆302Updated 11 months ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆214Updated last week
METR / public-tasks
☆108Updated 2 weeks ago
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆72Updated 2 years ago
r-three / git-theta
git extension for {collaborative, communal, continual} model development
☆216Updated last year
anthropics / evals
☆315Updated last year
collin-burns / discovering_latent_knowledge
☆283Updated last year
neelnanda-io / 1L-Sparse-Autoencoder
☆132Updated 2 years ago
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆231Updated 11 months ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
anthropics / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆198Updated 3 years ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆117Updated 5 months ago
callummcdougall / sae_visualizer
☆29Updated last year
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆550Updated 4 months ago
craffel / llm-seminar
Seminar on Large Language Models (COMP790-101 at UNC Chapel Hill, Fall 2022)
☆313Updated 3 years ago
PAIR-code / interpretability
PAIR.withgoogle.com and friend's work on interpretability methods
☆215Updated last week
KihoPark / LLM_Categorical_Hierarchical_Representations
☆111Updated 9 months ago
srush / raspy
An interactive exploration of Transformer programming.
☆270Updated 2 years ago
aypan17 / machiavelli
☆143Updated 4 months ago
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆53Updated 3 years ago
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆191Updated 2 years ago
METR / task-standard
METR Task Standard
☆168Updated 10 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆130Updated 9 months ago
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆216Updated last year
google-research / cascades
Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…
☆215Updated 6 months ago
ArthurConmy / Automatic-Circuit-Discovery
☆258Updated last year
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆132Updated this week
google-deepmind / mishax
☆144Updated 3 months ago