centerforaisafety/emergent-values

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/centerforaisafety/emergent-values)

centerforaisafety / emergent-values

Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"

☆90

Alternatives and similar repositories for emergent-values

Users that are interested in emergent-values are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chonghin33 / lcm-1.13-whitepaper
View on GitHub
This project contains the original white paper for Language Construct Modeling (LCM) v1.13, authored by Vincent Shing Hin Chong. It intro…
☆15Jul 23, 2025Updated 11 months ago
SprocketLab / roboshot
View on GitHub
☆24May 30, 2024Updated 2 years ago
annahdo / implementing_activation_steering
View on GitHub
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆24Oct 18, 2024Updated last year
ethz-spylab / unlearning-vs-safety
View on GitHub
☆27Oct 6, 2024Updated last year
UCSB-NLP-Chang / causal_unlearn
View on GitHub
[EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"
☆34Jul 22, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
angrave / CS341-Lectures-SP24
View on GitHub
CS341 for Spring 2024
☆11Jul 15, 2024Updated last year
gotohuman / langgraph-js-mcp
View on GitHub
☆18Mar 25, 2025Updated last year
CogComp / reasoning-eval
View on GitHub
☆25Nov 7, 2024Updated last year
entireio / skills
View on GitHub
✨ Cross-agent skills that help coding agents use Entire context from Checkpoints, sessions, and git history to search past work, explain …
☆187Jun 30, 2026Updated last week
airtai / prompt-leakage-probing
View on GitHub
☆15Mar 3, 2025Updated last year
LaunchPlatform / marketplace
View on GitHub
Marketplace ML experiment - training without backprop
☆28Sep 9, 2025Updated 10 months ago
ArthurConmy / MishformerLens
View on GitHub
MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…
☆10Oct 7, 2024Updated last year
microsoft / AdversarialGMM
View on GitHub
Minimax Estimation of Conditional Moment Models
☆32Jun 12, 2023Updated 3 years ago
centerforaisafety / Intro_to_ML_Safety
View on GitHub
☆78May 31, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
felixbinder / introspection_self_prediction
View on GitHub
Code for experiments on self-prediction as a way to measure introspection in LLMs
☆16Dec 10, 2024Updated last year
Phylliida / OpenClio
View on GitHub
Open source version of Anthropic's Clio: A system for privacy-preserving insights into real-world AI use
☆79Aug 19, 2025Updated 10 months ago
onesocialweb / osw-web
View on GitHub
Web client (static HTML and Javascript) built with Google Web Toolkit
☆44Jul 21, 2011Updated 14 years ago
nrimsky / LM-exp
View on GitHub
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆104Sep 21, 2023Updated 2 years ago
kxcloud / gradient-routing
View on GitHub
☆11Dec 4, 2024Updated last year
ninodimontalcino / moralchoice
View on GitHub
Evaluating the Moral Beliefs Encoded in LLMs
☆38Dec 17, 2024Updated last year
natolambert / job-search-viz
View on GitHub
A tool for visualization of complex job searches.
☆20Jul 8, 2022Updated 4 years ago
night-chen / DyGen
View on GitHub
[KDD'23] This is the code repo for our KDD'23 paper "DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling".
☆11Jun 14, 2023Updated 3 years ago
AngelaZZZ-611 / reasoning_models_probing
View on GitHub
☆21May 14, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AmitLevinson / 30daymapchallenge
View on GitHub
☆12Nov 15, 2022Updated 3 years ago
chrisliu298 / llm-unlearn-eco
View on GitHub
[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts
☆40Sep 26, 2024Updated last year
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated last year
rishub-tamirisa / tamper-resistance
View on GitHub
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
☆68Jun 9, 2025Updated last year
jamiequint / qmd
View on GitHub
mini cli search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current sota approaches while being all local
☆29Mar 9, 2026Updated 4 months ago
pizofreude / metaprompt
View on GitHub
Metaprompt is an AI-powered prompt generator developed by Anthropic. This is the unofficial Metaprompt Community Github repo. All PRs are…
☆14Mar 19, 2024Updated 2 years ago
loftusa / owls
View on GitHub
Subliminal learning in LLMs: language models can transmit hidden preferences through seemingly unrelated training data.
☆24Nov 9, 2025Updated 8 months ago
rtaori / data_feedback
View on GitHub
Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"
☆18Sep 9, 2022Updated 3 years ago
AmourWaltz / UAlign
View on GitHub
Project of ACL 2025 "UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models"
☆15Mar 25, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
UCSC-REAL / FLAT
View on GitHub
[ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data
☆14Feb 26, 2025Updated last year
amazon-science / factual-confidence-of-llms
View on GitHub
Code for paper "Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators"
☆17Dec 4, 2024Updated last year
QuixiAI / bridge-protocol
View on GitHub
☆20May 30, 2025Updated last year
dongjunKANG / VIM
View on GitHub
☆11Oct 16, 2023Updated 2 years ago
ruvnet / agentic-preview
View on GitHub
Agentic Preview is an asynchronous FastAPI backend service that allows users to deploy preview environments using Fly.io.
☆13Oct 11, 2024Updated last year
anishathalye / anishathalye
View on GitHub
A self-updating GitHub profile 🐯
☆17Updated this week
hulop / NavCogIOS
View on GitHub
NavCog is an example app of blelocpp library aimed specifically for the blind to help those people “explore” the world without vision. No…
☆10Jan 18, 2017Updated 9 years ago