kvablack/ddpo-pytorch

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kvablack/ddpo-pytorch)

kvablack / ddpo-pytorch

DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support

☆768

Alternatives and similar repositories for ddpo-pytorch

Users that are interested in ddpo-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jannerm / ddpo
View on GitHub
Code for the paper "Training Diffusion Models with Reinforcement Learning"
☆574Jul 5, 2023Updated 3 years ago
mihirp1998 / AlignProp
View on GitHub
AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more samp…
☆324Nov 1, 2024Updated last year
yk7333 / d3po
View on GitHub
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
☆244Apr 6, 2024Updated 2 years ago
SalesforceAIResearch / DiffusionDPO
View on GitHub
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
☆706Jun 2, 2026Updated last month
zai-org / ImageReward
View on GitHub
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
☆1,695Oct 29, 2025Updated 9 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
tgxs002 / HPSv2
View on GitHub
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
☆677May 24, 2024Updated 2 years ago
yifan123 / flow_grpo
View on GitHub
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
☆2,437May 7, 2026Updated 2 months ago
kvablack / LLaVA-server
View on GitHub
☆22Oct 20, 2023Updated 2 years ago
Shentao-YANG / Dense_Reward_T2I
View on GitHub
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
☆39May 9, 2024Updated 2 years ago
yuvalkirstain / PickScore
View on GitHub
☆601Dec 21, 2024Updated last year
tmabraham / ddpo-pytorch
View on GitHub
Reproduction of DDPO paper (RLHF for diffusion)
☆94Sep 20, 2023Updated 2 years ago
RockeyCoss / SPO
View on GitHub
[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
☆271Apr 7, 2025Updated last year
tgxs002 / align_sd
View on GitHub
Better Aligning Text-to-Image Models with Human Preference. ICCV 2023
☆293Jul 14, 2023Updated 3 years ago
XueZeyue / DanceGRPO
View on GitHub
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
☆1,642Oct 16, 2025Updated 9 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
pinterest / atg-research
View on GitHub
☆74Sep 23, 2025Updated 10 months ago
google-research-datasets / richhf-18k
View on GitHub
RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with t…
☆157Jun 25, 2024Updated 2 years ago
mapo-t2i / mapo
View on GitHub
Official codebase for Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (MaPO).
☆83Jun 11, 2024Updated 2 years ago
masa-ue / RLfinetuning_Diffusion_Bioseq
View on GitHub
Code for the tutorial/review paper for RL-based-fine-tuniing. In this code, we especially focus on the design of biological sequences li…
☆159Sep 15, 2024Updated last year
mihirp1998 / VADER
View on GitHub
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope…
☆315Mar 12, 2025Updated last year
KlingAIResearch / VideoAlign
View on GitHub
[NeurIPS 2025] Improving Video Generation with Human Feedback
☆489Sep 24, 2025Updated 10 months ago
FoundationVision / LlamaGen
View on GitHub
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,960Aug 15, 2024Updated last year
mit-han-lab / fastcomposer
View on GitHub
[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
☆715Jan 10, 2025Updated last year
sihyun-yu / REPA
View on GitHub
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
☆1,681Mar 16, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zhaoyl18 / SEIKO
View on GitHub
SEIKO is a novel reinforcement learning method to efficiently fine-tune diffusion models in an online setting. Our methods outperform all…
☆30Jul 18, 2024Updated 2 years ago
Owen-Oertell / rlcm
View on GitHub
☆58Sep 23, 2024Updated last year
TIGER-AI-Lab / VideoScore
View on GitHub
official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]
☆121Dec 4, 2025Updated 7 months ago
NVlabs / DiffusionNFT
View on GitHub
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
☆990Feb 10, 2026Updated 5 months ago
djghosh13 / geneval
View on GitHub
GenEval: An object-focused framework for evaluating text-to-image alignment
☆472Mar 3, 2025Updated last year
ziqihuangg / ReVersion
View on GitHub
[SIGGRAPH Asia 2024] ReVersion: Diffusion-Based Relation Inversion from Images
☆503Oct 7, 2025Updated 9 months ago
ExplainableML / ReNO
View on GitHub
[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
☆166Sep 15, 2025Updated 10 months ago
willisma / SiT
View on GitHub
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
☆1,191Dec 22, 2025Updated 7 months ago
openai / consistencydecoder
View on GitHub
Consistency Distilled Diff VAE
☆2,213Nov 7, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
facebookresearch / DiT
View on GitHub
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
☆8,693May 31, 2024Updated 2 years ago
UW-Madison-Lee-Lab / SFT-PG
View on GitHub
Code for "Optimizing DDPM Sampling with Shortcut Fine-Tuning" (https://arxiv.org/abs/2301.13362), ICML 2023
☆30Oct 6, 2023Updated 2 years ago
christophschuhmann / improved-aesthetic-predictor
View on GitHub
CLIP+MLP Aesthetic Score Predictor
☆1,328Jul 1, 2024Updated 2 years ago
tianweiy / DMD2
View on GitHub
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
☆1,415Mar 5, 2025Updated last year
zai-org / VisionReward
View on GitHub
[AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
☆422Mar 26, 2025Updated last year
opendilab / awesome-diffusion-model-in-rl
View on GitHub
A curated list of Diffusion Model in RL resources (continually updated)
☆1,630May 30, 2026Updated last month
CodeGoat24 / UnifiedReward
View on GitHub
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex
☆796Jun 18, 2026Updated last month