XuchanBao/behavioral-self-awareness

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/XuchanBao/behavioral-self-awareness)

XuchanBao / behavioral-self-awareness

☆37

Alternatives and similar repositories for behavioral-self-awareness

Users that are interested in behavioral-self-awareness are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

KempnerInstitute / llm_uncertainty
View on GitHub
Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"
☆11Jul 18, 2026Updated last week
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆17Oct 21, 2025Updated 9 months ago
matchten / LoRA-Models-for-SAEs
View on GitHub
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆17Mar 31, 2025Updated last year
tim-hua-01 / steering-eval-awareness-public
View on GitHub
☆17Mar 16, 2026Updated 4 months ago
rgreenblatt / model_organism_public
View on GitHub
☆15Jun 17, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
JoshEngels / SAE-Dark-Matter
View on GitHub
Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"
☆23Feb 6, 2025Updated last year
felixbinder / introspection_self_prediction
View on GitHub
Code for experiments on self-prediction as a way to measure introspection in LLMs
☆16Dec 10, 2024Updated last year
Gwinhen / DRUPE
View on GitHub
Distribution Preserving Backdoor Attack in Self-supervised Learning
☆20Jan 27, 2024Updated 2 years ago
safety-research / false-facts
View on GitHub
☆51Jul 4, 2025Updated last year
velocityCavalry / CREPE
View on GitHub
An original implementation of the paper "CREPE: Open-Domain Question Answering with False Presuppositions"
☆16Nov 5, 2024Updated last year
PRIS-CV / MSSRM
View on GitHub
An implementation of MSSRM method
☆10Mar 23, 2023Updated 3 years ago
ThirdAIResearch / Dessert
View on GitHub
DESSERT Effeciently Searches Sets of Embeddings via Retrieval Tables
☆18Feb 21, 2024Updated 2 years ago
slavachalnev / SAE-TS
View on GitHub
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆29Nov 20, 2024Updated last year
emergent-misalignment / emergent-misalignment
View on GitHub
☆317Jan 12, 2026Updated 6 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
simple-stories / simple_stories_train
View on GitHub
Trains small LMs. Designed for training on SimpleStories
☆14Sep 15, 2025Updated 10 months ago
facebookresearch / decrypto
View on GitHub
Implementation of the Decrypto benchmark for multi-agent reasoning and theory of mind.
☆22Jan 19, 2026Updated 6 months ago
safety-research / safety-tooling
View on GitHub
Inference API for many LLMs and other useful tools for empirical research
☆134May 29, 2026Updated 2 months ago
PKU-Alignment / ProgressGym
View on GitHub
Alignment with a millennium of moral progress. Spotlight@NeurIPS 2024 Track on Datasets and Benchmarks.
☆25Mar 30, 2025Updated last year
rgreenblatt / control-evaluations
View on GitHub
☆25May 25, 2024Updated 2 years ago
Lossfunk / KernelBench-v2
View on GitHub
KernelBench v2: Can LLMs Write GPU Kernels? - Benchmark with Torch -> Triton (and more!) problems
☆24Jul 4, 2025Updated last year
TeunvdWeij / sandbagging
View on GitHub
☆21Nov 15, 2024Updated last year
am-bean / lingOly
View on GitHub
A benchmark for language models based on the UK Linguistics Olympiad
☆12Mar 3, 2025Updated last year
jcmgray / einsum_bmm
View on GitHub
einsum via batch matrix multiply
☆15Nov 29, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
bluedotimpact / bluedot
View on GitHub
✨ Monorepo containing most of BlueDot Impact's custom software.
☆28Updated this week
alan-cooney / transformer-lens-starter-template
View on GitHub
A quick way to get started with Transformer Lens
☆14Dec 13, 2023Updated 2 years ago
SORRY-Bench / sorry-bench
View on GitHub
Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)
☆83Mar 1, 2025Updated last year
aryamanarora / causalgym
View on GitHub
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
☆54Nov 30, 2024Updated last year
PrasannS / rlhf-length-biases
View on GitHub
☆27Mar 13, 2024Updated 2 years ago
Ali-Omrani / CCR
View on GitHub
Conceptual Construct Representations
☆11Feb 23, 2023Updated 3 years ago
hppRC / llm-translator
View on GitHub
Mixtral-based Ja-En (En-Ja) Translation model
☆20Jan 6, 2025Updated last year
aisa-group / promptinject-agent-skills
View on GitHub
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
☆21Jul 2, 2026Updated 3 weeks ago
Aloriosa / srmt
View on GitHub
The original Shared Recurrent Memory Transformer implementation
☆36Jul 11, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
TruthfulAI-research / negation_neglect
View on GitHub
Code for Negation Neglect
☆16May 22, 2026Updated 2 months ago
choidami / inductive-oocr
View on GitHub
☆16Mar 22, 2025Updated last year
arman-aminian / network-anomaly-detection
View on GitHub
Rahnema Final Project - Network anomaly detection
☆11Jul 22, 2021Updated 5 years ago
ApolloResearch / sample
View on GitHub
Repository with sample code using Apollo's suggested engineering practices
☆15Dec 16, 2024Updated last year
rsharifnasab / go-linkstate-simulation
View on GitHub
simulate linkstate algorithm for routing
☆10Nov 6, 2023Updated 2 years ago
amirhosseinNouri / meet-auto-admit
View on GitHub
Use this extension to automate google meet admission.
☆11Mar 1, 2021Updated 5 years ago
amirhallaji / Computational-Intelligence
View on GitHub
☆11Mar 12, 2021Updated 5 years ago