liziniu/policy_optimization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/liziniu/policy_optimization)

liziniu / policy_optimization

Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)

☆29

Alternatives and similar repositories for policy_optimization

Users that are interested in policy_optimization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

syncdoth / Chain-of-Hindsight-PyTorch
View on GitHub
Unofficial implementation of Chain of Hindsight (https://arxiv.org/abs/2302.02676) using pytorch and huggingface Trainers.
☆11Apr 5, 2023Updated 3 years ago
liziniu / HyperDQN
View on GitHub
Code for ICLR 2022 Paper (HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning)
☆12Nov 28, 2023Updated 2 years ago
liziniu / KnapsackRL
View on GitHub
☆19Oct 30, 2025Updated 8 months ago
wzhouad / WPO
View on GitHub
Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"
☆41Sep 24, 2024Updated last year
abbyvansoest / maxent
View on GitHub
☆14May 30, 2019Updated 7 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
tangzhy / RealCritic
View on GitHub
☆15Jan 27, 2025Updated last year
CharlieMat / GFN4Rec
View on GitHub
Source code for paper "Generative Flow Network for Listwise Recommendation"
☆18Nov 8, 2024Updated last year
liziniu / GEM
View on GitHub
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆58May 12, 2025Updated last year
zzq-bot / ROMANCE
View on GitHub
code for ROMANCE
☆14Oct 12, 2024Updated last year
robintyh1 / icml2021-pengqlambda
View on GitHub
Revisiting Peng's Q(lambda) for Modern Reinforcement Learning
☆15Jul 23, 2021Updated 5 years ago
GXimingLu / IPA
View on GitHub
Codebase for Inference-Time Policy Adapters
☆25Nov 3, 2023Updated 2 years ago
SuReLI / llrl
View on GitHub
Lipschitz Lifelong RL
☆11Nov 6, 2020Updated 5 years ago
facebookresearch / rlfh-gen-div
View on GitHub
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆50Jan 19, 2024Updated 2 years ago
MadryLab / journey-TRAK
View on GitHub
Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"
☆25Dec 12, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
janphilippfranken / sami
View on GitHub
Self-Supervised Alignment with Mutual Information
☆20May 24, 2024Updated 2 years ago
id9502 / Option-GAIL
View on GitHub
☆12Dec 22, 2021Updated 4 years ago
StoneT2000 / trajectorytranslation
View on GitHub
Code for Abstract-to-Executable Trajectory Translation for One Shot Task Generalization (ICML 2023)
☆23May 12, 2023Updated 3 years ago
KyunghyunLee / aes-rl
View on GitHub
☆17Dec 12, 2020Updated 5 years ago
flowersteam / playground_env
View on GitHub
Implementation of the Playground environment from the paper Language as a Cognitive Tool to Imagine Goals inCuriosity-Driven Exploration.
☆11Mar 5, 2021Updated 5 years ago
sinwang20 / D2PO
View on GitHub
[ACL 2025] "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning." https://arxiv.org/abs/2503.1…
☆18Jul 22, 2025Updated last year
wangjs9 / Aligned-dPM
View on GitHub
PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach
☆32Nov 6, 2023Updated 2 years ago
ucl-dark / llm_debate
View on GitHub
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
☆131Mar 22, 2024Updated 2 years ago
lingo-mit / lm-truthfulness
View on GitHub
☆17Dec 21, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
shizhediao / Black-Box-Prompt-Learning
View on GitHub
Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"
☆59Sep 7, 2023Updated 2 years ago
levilelis / h-levin
View on GitHub
Levin tree search guided by both a policy and a heuristic function
☆19Jul 13, 2023Updated 3 years ago
Victorwz / LaViA
View on GitHub
☆10Jul 13, 2024Updated 2 years ago
huanzhang12 / SA_PPO
View on GitHub
[NeurIPS 2020 Spotlight] State-adversarial PPO for robust deep reinforcement learning
☆32Nov 18, 2021Updated 4 years ago
xionghuichen / RLAssistant
View on GitHub
RLA is a tool for managing your RL experiments automatically
☆71Feb 7, 2023Updated 3 years ago
johnson7788 / AlphaMix
View on GitHub
强化学习进行量化金融
☆44Jul 20, 2022Updated 4 years ago
ryanxhr / DWBC
View on GitHub
[ICML 2022] The official implementation of DWBC in "Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations"
☆35Jan 5, 2023Updated 3 years ago
jiangsy / mbpo_pytorch
View on GitHub
☆30Mar 1, 2022Updated 4 years ago
mireshghallah / neighborhood-curvature-mia
View on GitHub
☆27Aug 18, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
uoe-agents / BRDiv
View on GitHub
Codebase for BRDiv: Diverse teammate generation for ad hoc teamwork
☆13May 2, 2024Updated 2 years ago
dbsxodud-11 / PAG
View on GitHub
Official Code for Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation (CVPR 2025)
☆15Apr 2, 2025Updated last year
YiqinYang / ICQ
View on GitHub
Codes accompanying the paper "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning" (NeurIPS…
☆76Oct 18, 2022Updated 3 years ago
saiboxx / offline-reinforcement-learning
View on GitHub
Exploring algorithms in the domain of offline reinforcement learning (REM, Ensemble-DQN, DQN, ...)
☆17Jul 7, 2020Updated 6 years ago
RulinShao / RAG-evaluation-harnesses
View on GitHub
An evaluation suite for Retrieval-Augmented Generation (RAG).
☆25Apr 26, 2025Updated last year
CMU-AIRe / MRT
View on GitHub
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆120Jun 23, 2026Updated 3 weeks ago
real-stanford / ASPiRe
View on GitHub
[NeurIPS 2022] ASPiRe: Adaptive Skill Priors for Reinforcement Learning
☆13Oct 19, 2022Updated 3 years ago