Asap7772 / understanding-rlhfView external linksLinks
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy sampling or negative gradients outperform offline, maximum likelihood objectives.
☆32Apr 20, 2024Updated last year
Alternatives and similar repositories for understanding-rlhf
Users that are interested in understanding-rlhf are comparing it to the libraries listed below
Sorting:
- Official Code Release for "Training a Generally Curious Agent"☆44May 18, 2025Updated 8 months ago
- ☆24Apr 3, 2025Updated 10 months ago
- Drop-in environment replacements that make your RL algorithm train faster.☆21Jun 19, 2024Updated last year
- ☆16Feb 22, 2025Updated 11 months ago
- (NeurIPS '22) LISA: Learning Interpretable Skill Abstractions - A framework for unsupervised skill learning using Imitation☆29Feb 22, 2023Updated 2 years ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- ☆14Oct 4, 2024Updated last year
- ☆16Jul 16, 2024Updated last year
- ☆21Jul 21, 2025Updated 6 months ago
- ☆35Apr 12, 2024Updated last year
- ☆12Dec 22, 2021Updated 4 years ago
- WebGym: Web-browser-based tasks for RL Agents☆23Feb 4, 2021Updated 5 years ago
- ☆19Aug 4, 2025Updated 6 months ago
- Official implementation of the DECKARD Agent from the paper "Do Embodied Agents Dream of Pixelated Sheep?"☆94May 23, 2023Updated 2 years ago
- Memory-Based Meta-Learning on Non-Stationary Distributions☆17Mar 14, 2024Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated last year
- ☆14May 31, 2022Updated 3 years ago
- Fork of Flame repo for training of some new stuff in development☆19Jan 5, 2026Updated last month
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 8 months ago
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 5 months ago
- Experiment for Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning☆26Jan 16, 2023Updated 3 years ago
- ☆19Mar 16, 2025Updated 10 months ago
- MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following☆16Oct 31, 2024Updated last year
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning☆28Jul 14, 2025Updated 7 months ago
- Code to accompany the paper "The Information Geometry of Unsupervised Reinforcement Learning"☆20Oct 6, 2021Updated 4 years ago
- Official implementation of TBA for async LLM post-training.☆28Nov 5, 2025Updated 3 months ago
- [ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"☆29Jan 10, 2026Updated last month
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- ☆17Aug 1, 2025Updated 6 months ago
- Code for Abstract-to-Executable Trajectory Translation for One Shot Task Generalization (ICML 2023)☆23May 12, 2023Updated 2 years ago
- ☆130Oct 1, 2024Updated last year
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆47Jan 19, 2024Updated 2 years ago
- [ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, co…☆34Sep 9, 2025Updated 5 months ago
- Code and data for "ImgTrojan: Jailbreaking Vision-Language Models with ONE Image"☆24Mar 26, 2025Updated 10 months ago
- RFTT: Reasoning with Reinforced Functional Token Tuning☆29Feb 4, 2026Updated last week
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆143Feb 24, 2025Updated 11 months ago
- Natural Language Reinforcement Learning☆101Jul 30, 2025Updated 6 months ago
- Codebase for "Uni[MASK]: Unified Inference in Sequential Decision Problems"☆57Jul 3, 2024Updated last year
- ☆24Jan 28, 2025Updated last year