apartresearch / Neuron2GraphLinks

Tools for exploring Transformer neuron behaviour, including input pruning and diversification.

☆20

Alternatives and similar repositories for Neuron2Graph

Users that are interested in Neuron2Graph are comparing it to the libraries listed below

Sorting:

taylorwwebb / emergent_analogies_LLM
Code for 'Emergent Analogical Reasoning in Large Language Models'
☆51Updated last year
bilal-chughtai / rep-theory-mech-interp
☆26Updated 2 years ago
allenai / discoveryworld
A virtual environment for developing and evaluating automated scientific discovery agents.
☆166Updated 4 months ago
METR / RE-Bench
☆95Updated 3 months ago
csinva / interpretable-embeddings
Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)
☆39Updated 8 months ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆128Updated 2 years ago
causalNLP / cladder
We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.
☆119Updated last year
meg-tong / sycophancy-eval
datasets from the paper "Towards Understanding Sycophancy in Language Models"
☆86Updated last year
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆30Updated last year
allenai / discoverybench
Discovering Data-driven Hypotheses in the Wild
☆104Updated last month
KihoPark / LLM_Categorical_Hierarchical_Representations
☆104Updated 5 months ago
jlin816 / dialop
DialOp: Decision-oriented dialogue environments for collaborative language agents
☆109Updated 8 months ago
victorvikram / ConceptARC
Materials for ConceptARC paper
☆98Updated 9 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆54Updated 3 months ago
neuroagents-lab / PyTorchTNN
Temporal Neural Networks
☆15Updated last week
KoyenaPal / future-lens
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
☆18Updated last year
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
ethancaballero / broken_neural_scaling_laws
Code Release for "Broken Neural Scaling Laws" (BNSL) paper
☆59Updated last year
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆72Updated last year
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆186Updated 2 years ago
mechanistic-interpretability-grokking / progress-measures-paper
☆68Updated 2 years ago
noanabeshima / matryoshka-saes
☆21Updated 8 months ago
gabegrand / world-models
☆208Updated 2 years ago
allenai / SciRIFF
Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.
☆40Updated 4 months ago
aypan17 / machiavelli
☆137Updated 2 weeks ago
CarperAI / autocrit
A repository for transformer critique learning and generation
☆90Updated last year
redwoodresearch / Easy-Transformer
☆121Updated last year
p-lambda / incontext-learning
Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…
☆108Updated last year
causalNLP / corr2cause
Data and code for the Corr2Cause paper (ICLR 2024)
☆108Updated last year
thestephencasper / everything-you-need
we got you bro
☆36Updated last year