centerforaisafety / emergent-valuesLinks
Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"
☆76Updated 8 months ago
Alternatives and similar repositories for emergent-values
Users that are interested in emergent-values are comparing it to the libraries listed below
Sorting:
- ☆80Updated 2 weeks ago
 - Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆59Updated 5 months ago
 - Functional Benchmarks and the Reasoning Gap☆89Updated last year
 - Code for ExploreTom☆86Updated 4 months ago
 - Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 6 months ago
 - Public repository containing METR's DVC pipeline for eval data analysis☆124Updated 6 months ago
 - Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆53Updated 2 months ago
 - Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆99Updated last week
 - A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆79Updated 10 months ago
 - ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆88Updated this week
 - ☆100Updated 3 months ago
 - Learning to route instances for Human vs AI Feedback (ACL Main '25)☆25Updated 3 months ago
 - Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆57Updated 5 months ago
 - Open Source Replication of Anthropic's Alignment Faking Paper☆50Updated 6 months ago
 - A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆170Updated this week
 - ☆55Updated 11 months ago
 - Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
 - [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆79Updated 7 months ago
 - ☆43Updated last year
 - A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 6 months ago
 - Official Repo for CRMArena and CRMArena-Pro☆121Updated 4 months ago
 - Mixing Language Models with Self-Verification and Meta-Verification☆109Updated 10 months ago
 - Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆62Updated 6 months ago
 - Super basic implementation (gist-like) of RLMs with REPL environments.☆228Updated 2 weeks ago
 - An automated tool for discovering insights from research papaer corpora☆138Updated last year
 - Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 9 months ago
 - Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation☆48Updated last year
 - QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆24Updated last week
 - ☆142Updated last month
 - ☆58Updated 4 months ago