MinhxLe / subliminal-learningLinks

☆104

Alternatives and similar repositories for subliminal-learning

Users that are interested in subliminal-learning are comparing it to the libraries listed below

Sorting:

google-deepmind / mishax
☆143Updated 2 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆163Updated 7 months ago
METR / RE-Bench
☆117Updated last month
google-deepmind / dangerous-capability-evaluations
☆62Updated last month
METR / eval-analysis-public
Public repository containing METR's DVC pipeline for eval data analysis
☆129Updated 7 months ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆212Updated last week
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆214Updated last year
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆89Updated last year
yash-srivastava19 / arrakis
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆31Updated 7 months ago
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆34Updated 7 months ago
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆120Updated last week
AsaCooperStickland / situational-awareness-evals
Measuring the situational awareness of language models
☆39Updated last year
centerforaisafety / emergent-values
Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"
☆83Updated 8 months ago
METR / public-tasks
☆106Updated this week
emergent-misalignment / emergent-misalignment
☆226Updated 3 weeks ago
METR / task-standard
METR Task Standard
☆167Updated 9 months ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆31Updated 5 months ago
PrimeIntellect-ai / prime-environments
Training-Ready RL Environments + Evals
☆177Updated this week
poking-agents / modular-public
☆32Updated 5 months ago
safety-research / false-facts
☆24Updated 4 months ago
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
haizelabs / bijection-learning
☆26Updated last year
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆59Updated last month
KihoPark / LLM_Categorical_Hierarchical_Representations
☆111Updated 9 months ago
allenai / infinigram-api
☆82Updated this week
rgreenblatt / arc_draw_more_samples_pub
Draw more samples
☆195Updated last year
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆79Updated 11 months ago
joshuacnf / Ctrl-G
☆104Updated 10 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year