centerforaisafety / emergent-values
Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"
☆49Updated 2 months ago
Alternatives and similar repositories for emergent-values
Users that are interested in emergent-values are comparing it to the libraries listed below
Sorting:
- ☆57Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- Functional Benchmarks and the Reasoning Gap☆86Updated 7 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆65Updated 2 months ago
- ☆48Updated 6 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆37Updated 2 weeks ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆74Updated last month
- ☆54Updated 3 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆54Updated 5 months ago
- Evaluating LLMs with CommonGen-Lite☆90Updated last year
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆31Updated last week
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆90Updated 3 months ago
- ☆114Updated 2 months ago
- Learning to route instances for Human vs AI Feedback (ACL 2025 Main)☆23Updated this week
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆42Updated last month
- ☆11Updated 9 months ago
- ☆48Updated last year
- Code for ExploreTom☆83Updated 5 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆103Updated 5 months ago
- ☆50Updated 5 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 10 months ago
- Testing paligemma2 finetuning on reasoning dataset☆18Updated 4 months ago
- An attribution library for LLMs☆41Updated 8 months ago
- Collection of LLM completions for reasoning-gym task datasets☆20Updated last week
- Verifiers for LLM Reinforcement Learning☆50Updated last month
- ☆24Updated last year
- ☆65Updated 2 months ago
- ☆74Updated 3 weeks ago
- look how they massacred my boy☆63Updated 7 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆53Updated last month