openai/emergent-misalignment-persona-features

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/openai/emergent-misalignment-persona-features)

openai / emergent-misalignment-persona-features

☆58

Alternatives and similar repositories for emergent-misalignment-persona-features

Users that are interested in emergent-misalignment-persona-features are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

clarifying-EM / model-organisms-for-EM
View on GitHub
Code repo for the model organisms and convergent directions of EM papers.
☆72Sep 22, 2025Updated 10 months ago
emergent-misalignment / emergent-misalignment
View on GitHub
☆317Jan 12, 2026Updated 6 months ago
openai / sample-deep-research-mcp
View on GitHub
Example MCP server compatible with Deep Research
☆66Jun 25, 2025Updated last year
safety-research / A3
View on GitHub
☆19Dec 29, 2025Updated 7 months ago
safety-research / persona_vectors
View on GitHub
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
☆454Apr 22, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
openai / openai-reflect
View on GitHub
Physical AI Assistant that illuminates your life
☆192Mar 27, 2026Updated 4 months ago
verses-xyz / pluriverse
View on GitHub
🌌 Towards a Digital Pluriverse: “a world where many worlds may fit”
☆15Feb 20, 2022Updated 4 years ago
openai / emoclassifiers
View on GitHub
Classifiers for "Investigating Affective Use and Emotional Well-being in ChatGPT"
☆53Jul 3, 2025Updated last year
davidbau / sidn-handbook
View on GitHub
The Structure and Interpretation of Deep Networks Handbook
☆14Dec 14, 2024Updated last year
longtermrisk / openweights
View on GitHub
A python sdk for LLM finetuning and inference on runpod infrastructure
☆30May 12, 2026Updated 2 months ago
llylly / RANUM
View on GitHub
[ICSE 2023] Differentiable interpretation and failure-inducing input generation for neural network numerical bugs.
☆13Jan 5, 2024Updated 2 years ago
XuchanBao / behavioral-self-awareness
View on GitHub
☆37Feb 20, 2025Updated last year
kyegomez / OpenStrawberry
View on GitHub
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
☆31Updated this week
safety-research / auditing-agents
View on GitHub
☆29Jul 1, 2026Updated 3 weeks ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Doriandarko / OraclesGPT
View on GitHub
☆11Aug 22, 2023Updated 2 years ago
openai / openai-builder-lab
View on GitHub
☆54Feb 3, 2025Updated last year
nishantsubramani / steering_vectors
View on GitHub
Steering Vector Repo from "Extracting Latent Steering Vectors from Pretrained Language Models" - ACL2022 Findings
☆11Mar 14, 2022Updated 4 years ago
openai / monitorability-evals
View on GitHub
Open-sourced evaluation suite from the Monitoring Monitorability paper
☆88Jun 11, 2026Updated last month
OthersideAI / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆12Nov 27, 2023Updated 2 years ago
infi-coder / infibench-evaluation-harness
View on GitHub
The Infibench variant of bigcode-evaluation-harness --- a framework for the evaluation of autoregressive code generation language models.
☆14Oct 19, 2024Updated last year
flowersteam / EAGER
View on GitHub
☆10Oct 11, 2022Updated 3 years ago
google-deepmind / agent_debugger
View on GitHub
Causal Analysis of Agent Behavior for AI Safety
☆21Jun 27, 2023Updated 3 years ago
microsoft / kata-containers
View on GitHub
Kata Containers is an open source project and community working to build a standard implementation of lightweight Virtual Machines (VMs) …
☆94Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
timaeus-research / devinterp
View on GitHub
Tools for studying developmental interpretability in neural networks.
☆146Apr 23, 2026Updated 3 months ago
interp-reasoning / thought-anchors
View on GitHub
⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.
☆137Oct 27, 2025Updated 9 months ago
shizhediao / automate-cot
View on GitHub
Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"
☆20Feb 24, 2024Updated 2 years ago
tcwangshiqi-columbia / GCP-CROWN
View on GitHub
The official repo for GCP-CROWN paper
☆13Sep 26, 2022Updated 3 years ago
microsoft / rats
View on GitHub
Rats is a collection of tools to help researchers define and run experiments. It is designed to be a modular and extensible framework cur…
☆33Updated this week
bramses / gpt-to-chatgpt-py
View on GitHub
Convert a regular GPT call into a ChatGPT call
☆14Mar 2, 2023Updated 3 years ago
internet-development / www
View on GitHub
The old version of https://internet.dev
☆23Jan 22, 2025Updated last year
eth-sri / deepg
View on GitHub
Certifying Geometric Robustness of Neural Networks
☆16Mar 24, 2023Updated 3 years ago
kyegomez / MLXTransformer
View on GitHub
Simple Implementation of a Transformer in the new framework MLX by Apple
☆19Nov 18, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Jacob-yen / GRIST
View on GitHub
This is the implementation repository of our incoming ESEC/FSE 2021 paper: Exposing Numerical Bugs in Deep Learning via GradientBack-prop…
☆15Oct 16, 2022Updated 3 years ago
infi-coder / infibench-evaluator
View on GitHub
The evaluation framework for the InfiCoder-Eval benchmark.
☆21Jul 22, 2024Updated 2 years ago
Harvard-CS-2881 / harvard-cs-2881-hw0
View on GitHub
harvard-cs-2881-classroom-hw0-c2881-hw0 created by GitHub Classroom
☆16Jul 26, 2025Updated last year
webscraping-ai / webscraping-ai-python
View on GitHub
Python client for https://webscraping.ai API. It provides Chrome JS rendering, rotating proxies and HTML parsing for web scraping.
☆17Jul 17, 2026Updated last week
anthropic-experimental / automated-auditing
View on GitHub
Prompts used in the Automated Auditing Blog Post
☆167Jul 24, 2025Updated last year
viemccoy / grimoire
View on GitHub
Synthetic Hypertext and Homomorphic Catalogue
☆16Dec 28, 2024Updated last year
rocicorp / zero-sqlite3
View on GitHub
better-sqlite3 built against the sqlite3 bedrock branch. +cli, +stmt_scanstatus, -loadable extensions.
☆18Updated this week