Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy sampling or negative gradients outperform offline, maximum likelihood objectives.
☆32Apr 20, 2024Updated last year
Alternatives and similar repositories for understanding-rlhf
Users that are interested in understanding-rlhf are comparing it to the libraries listed below
Sorting:
- Official Code Release for "Training a Generally Curious Agent"☆45May 18, 2025Updated 9 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆28Mar 1, 2025Updated last year
- ☆24Apr 3, 2025Updated 11 months ago
- Drop-in environment replacements that make your RL algorithm train faster.☆21Jun 19, 2024Updated last year
- ☆16Feb 22, 2025Updated last year
- ☆10Nov 17, 2022Updated 3 years ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Counterfactual Evaluation and Learning for Interactive Systems: Foundations, Implementations, and Recent Advances☆12Aug 14, 2022Updated 3 years ago
- ☆21Jul 21, 2025Updated 7 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year
- ☆15Oct 4, 2024Updated last year
- Optim4RL is a Jax framework of learning to optimize for reinforcement learning.☆28Nov 27, 2024Updated last year
- ☆12Dec 22, 2021Updated 4 years ago
- WebGym: Web-browser-based tasks for RL Agents☆24Feb 4, 2021Updated 5 years ago
- ☆19Aug 4, 2025Updated 7 months ago
- Official implementation of the DECKARD Agent from the paper "Do Embodied Agents Dream of Pixelated Sheep?"☆94May 23, 2023Updated 2 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- Memory-Based Meta-Learning on Non-Stationary Distributions☆17Mar 14, 2024Updated last year
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 9 months ago
- Fork of Flame repo for training of some new stuff in development☆19Feb 27, 2026Updated last week
- ☆14May 31, 2022Updated 3 years ago
- MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series☆17Sep 5, 2025Updated 6 months ago
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 6 months ago
- ☆19Mar 16, 2025Updated 11 months ago
- MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following☆16Oct 31, 2024Updated last year
- Experiment for Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning☆26Jan 16, 2023Updated 3 years ago
- ☆25Dec 13, 2024Updated last year
- ☆23Jan 17, 2025Updated last year
- Official implementation of TBA for async LLM post-training.☆29Nov 5, 2025Updated 4 months ago
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning☆28Jul 14, 2025Updated 7 months ago
- Code to accompany the paper "The Information Geometry of Unsupervised Reinforcement Learning"☆20Oct 6, 2021Updated 4 years ago
- Official Repository for Task-Circuit Quantization☆24Jun 1, 2025Updated 9 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆27Oct 14, 2025Updated 4 months ago
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆56Dec 9, 2024Updated last year
- Implementation and datasets for "Training Language Models to Generate Quality Code with Program Analysis Feedback"☆42Jul 21, 2025Updated 7 months ago
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- ☆17Aug 1, 2025Updated 7 months ago
- Code for Abstract-to-Executable Trajectory Translation for One Shot Task Generalization (ICML 2023)☆23May 12, 2023Updated 2 years ago
- ☆130Oct 1, 2024Updated last year