AsaCooperStickland/situational-awareness-evals

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AsaCooperStickland/situational-awareness-evals)

AsaCooperStickland / situational-awareness-evals

Measuring the situational awareness of language models

☆41

Alternatives and similar repositories for situational-awareness-evals

Users that are interested in situational-awareness-evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

choidami / inductive-oocr
View on GitHub
☆16Mar 22, 2025Updated last year
aogara-ds / hoodwinked-website
View on GitHub
A text-based game where language models learn to lie and to detect lies.
☆12Oct 4, 2023Updated 2 years ago
yamato-me / reinforcement-learning-replications
View on GitHub
Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.
☆24Apr 4, 2026Updated 3 months ago
ApolloResearch / sample
View on GitHub
Repository with sample code using Apollo's suggested engineering practices
☆15Dec 16, 2024Updated last year
felixbinder / introspection_self_prediction
View on GitHub
Code for experiments on self-prediction as a way to measure introspection in LLMs
☆16Dec 10, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
TeunvdWeij / sandbagging
View on GitHub
☆21Nov 15, 2024Updated last year
alan-cooney / transformer-lens-starter-template
View on GitHub
A quick way to get started with Transformer Lens
☆14Dec 13, 2023Updated 2 years ago
LRudL / evalugator
View on GitHub
(Model-written) LLM evals library
☆19Dec 13, 2024Updated last year
Aaquib111 / edge-attribution-patching
View on GitHub
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆48May 31, 2024Updated 2 years ago
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆58May 1, 2025Updated last year
LukeBailey181 / obfuscated-activations
View on GitHub
Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses
☆31Feb 11, 2025Updated last year
aws-samples / end-2-end-3d-ml
View on GitHub
This repository features Amazon SageMaker Ground Truth and explains how to ingest raw 3D point cloud data, label it, train a 3D object de…
☆13Jun 23, 2022Updated 4 years ago
redwoodresearch / remix_public
View on GitHub
☆20Feb 17, 2023Updated 3 years ago
EleutherAI / elk
View on GitHub
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆221Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lukasberglund / reversal_curse
View on GitHub
☆313Nov 17, 2023Updated 2 years ago
fastforwardlabs / question_answering
View on GitHub
CDSW/CML version of FF14
☆15Jan 29, 2021Updated 5 years ago
fullflu / bayes-by-backprop
View on GitHub
☆13Nov 21, 2016Updated 9 years ago
redwoodresearch / Text-Steganography-Benchmark
View on GitHub
Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.
☆25Jan 26, 2024Updated 2 years ago
gt-big-data / solar-forecasting
View on GitHub
An application that displays a map and graphs showing solar irradiance forecasts in solar farms in Georgia using data from the National S…
☆10Oct 15, 2021Updated 4 years ago
moirage / alignment-research-dataset
View on GitHub
A dataset of alignment research and code to reproduce it
☆80Jun 22, 2023Updated 3 years ago
ayazhafiz / sherpa_41
View on GitHub
Simple browser engine.
☆35Feb 15, 2020Updated 6 years ago
safety-research / safety-tooling
View on GitHub
Inference API for many LLMs and other useful tools for empirical research
☆134May 29, 2026Updated 2 months ago
timaeus-research / devinterp
View on GitHub
Tools for studying developmental interpretability in neural networks.
☆146Apr 23, 2026Updated 3 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
AlignmentResearch / learned-planner
View on GitHub
Interpreting Learned Search and Planning: Reverse-engineering recurrent convolutional networks (DRC) that play Sokoban
☆21Jun 29, 2025Updated last year
usnistgov / agentdojo-inspect
View on GitHub
A fork of AgentDojo compatible with Inspect.
☆17Oct 23, 2025Updated 9 months ago
peterhurford / squigglepy
View on GitHub
Squiggle programming language for intuitive probabilistic estimation features in Python
☆84Jun 8, 2026Updated last month
chunhuizhang / llms_tuning
View on GitHub
stay tuned.
☆18Jul 7, 2025Updated last year
aws-neuron / aws-neuron-sagemaker-samples
View on GitHub
☆13Dec 19, 2025Updated 7 months ago
MichaelEinhorn / trl-textworld
View on GitHub
☆13May 7, 2023Updated 3 years ago
Witalia008 / kaggle-public
View on GitHub
Code from Machine Learning competitions on Kaggle
☆11Apr 1, 2021Updated 5 years ago
jplhughes / dotfiles
View on GitHub
Easily deploy my zsh and tmux configuration on new machines. Includes local and remote aliases to improve workflow.
☆15Apr 23, 2026Updated 3 months ago
mariekemeelen / actib
View on GitHub
This repository will soon contain all scripts and links to the annotated corpora of Tibetan.
☆14Feb 4, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
goldblum / free-lunch
View on GitHub
Implementation of experiments from The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
☆17May 14, 2023Updated 3 years ago
milesaturpin / cot-unfaithfulness
View on GitHub
☆57Oct 23, 2023Updated 2 years ago
gladstoneai / POWERplay
View on GitHub
☆11Oct 24, 2022Updated 3 years ago
awslabs / state-space-models-neuron
View on GitHub
☆16Apr 11, 2025Updated last year
zhu-minjun / SafetyLock
View on GitHub
Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!
☆11Oct 16, 2024Updated last year
leap-laboratories / PIZZA
View on GitHub
An attribution library for LLMs
☆46Sep 17, 2024Updated last year
DFRobot / DFRobot_AS7341
View on GitHub
We live in a colorful world, but how much do you really know about color? You eyes may deceive you, while the sensors don’t lie. This AS7…
☆12Jan 20, 2022Updated 4 years ago