PKU-Alignment / ProgressGym
Alignment with a millennium of moral progress. Spotlight@NeurIPS 2024 Track on Datasets and Benchmarks.
☆13Updated this week
Related projects ⓘ
Alternatives and complementary repositories for ProgressGym
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆39Updated 3 months ago
- ☆26Updated last year
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆39Updated last month
- [ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models☆15Updated 4 months ago
- How to create rational LLM-based agents? Using game-theoretic workflows!☆29Updated this week
- Official repository for Decentralized Arena via Collective LLM Intelligence☆8Updated last month
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆98Updated 2 months ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆86Updated this week
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 2 months ago
- [ICML 2024] Language Models Represent Beliefs of Self and Others☆26Updated 2 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆98Updated last month
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆84Updated 8 months ago
- This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Models☆47Updated 11 months ago
- ☆21Updated 3 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- ☆18Updated 5 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆63Updated last year
- ☆74Updated 4 months ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Updated last year
- ☆24Updated 7 months ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆38Updated 10 months ago
- ☆72Updated 8 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated 2 months ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆123Updated last year
- Evaluating the Moral Beliefs Encoded in LLMs☆21Updated 10 months ago
- Dateset Reset Policy Optimization☆28Updated 7 months ago
- Directional Preference Alignment☆51Updated 2 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆50Updated 6 months ago
- ☆36Updated 3 months ago
- Augmenting Statistical Models with Natural Language Parameters☆17Updated 2 months ago