facebookresearch / rlfh-gen-divView external linksLinks
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆47Jan 19, 2024Updated 2 years ago
Alternatives and similar repositories for rlfh-gen-div
Users that are interested in rlfh-gen-div are comparing it to the libraries listed below
Sorting:
- Rewarded soups official implementation☆62Sep 27, 2023Updated 2 years ago
- Directed masked autoencoders☆14Feb 5, 2026Updated last week
- [ACL'24 Findings] Official code for "TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback"☆12Dec 6, 2024Updated last year
- ☆21Jun 22, 2025Updated 7 months ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Dec 19, 2023Updated 2 years ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Apr 20, 2024Updated last year
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Aug 19, 2025Updated 5 months ago
- RewardBench: the first evaluation tool for reward models.☆687Jan 31, 2026Updated 2 weeks ago
- JudgeLRM: Large Reasoning Models as a Judge☆41Jan 29, 2026Updated 2 weeks ago
- ☆18Jul 24, 2023Updated 2 years ago
- ☆14Jul 24, 2024Updated last year
- [NeurIPS 2025] Reasoning Models Better Express Their Confidence"☆22Nov 19, 2025Updated 2 months ago
- ☆160Nov 23, 2024Updated last year
- Simple notebooks to learn diffusion models on toy datasets☆17Feb 9, 2023Updated 3 years ago
- Recipes to train reward model for RLHF.☆1,512Apr 24, 2025Updated 9 months ago
- Code release of paper "ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning" (NeurIPS 2023)☆17Dec 30, 2023Updated 2 years ago
- ☆16Mar 22, 2024Updated last year
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Aug 22, 2024Updated last year
- ☆20Sep 13, 2023Updated 2 years ago
- ☆19Mar 1, 2023Updated 2 years ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆191Jan 16, 2025Updated last year
- A minimal implementation of Drifting Models for 2D toy data. Unlike diffusion/flow models that iterate at inference, drifting models evo…☆56Updated this week
- Un-*** 50 billions multimodality dataset☆23Sep 14, 2022Updated 3 years ago
- ☆26May 30, 2023Updated 2 years ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- ☆25May 16, 2024Updated last year
- VQVAE | VAE | GumbelVAE | PixelCNN☆21Jun 15, 2020Updated 5 years ago
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆66Apr 24, 2024Updated last year
- A recipe for online RLHF and online iterative DPO.☆539Dec 28, 2024Updated last year
- Code and links for over 25,000 trained Atari agents☆98Aug 22, 2024Updated last year
- Self-Alignment with Principle-Following Reward Models☆169Sep 18, 2025Updated 4 months ago
- Official Code Release for "Training a Generally Curious Agent"☆45May 18, 2025Updated 8 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Aug 25, 2023Updated 2 years ago
- Minimal RLHF implementation built on top of minGPT.☆32Jul 4, 2024Updated last year
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…☆354Jun 18, 2023Updated 2 years ago
- Official codebase for "The Generalization Gap in Offline Reinforcement Learning" accepted to ICLR 2024☆28Aug 9, 2024Updated last year
- ☆32Mar 31, 2023Updated 2 years ago
- ☆29Apr 30, 2024Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆202Apr 17, 2025Updated 10 months ago