yinyueqin/DenseRewardRLHF-PPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yinyueqin/DenseRewardRLHF-PPO)

yinyueqin / DenseRewardRLHF-PPO

This repository contains the code and released models for the paper Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model, accepted at TMLR.

☆19

Alternatives and similar repositories for DenseRewardRLHF-PPO

Users that are interested in DenseRewardRLHF-PPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

heliossun / STLLaVA-Med
View on GitHub
Self-training LLaVA for medical
☆16Nov 3, 2024Updated last year
KD-TAO / VidKV
View on GitHub
VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
☆25Mar 26, 2025Updated last year
wanglichenxj / Dual-Relation-Semi-supervised-Multi-label-Learning
View on GitHub
☆23Sep 3, 2020Updated 5 years ago
Shentao-YANG / Preference_Grounded_Guidance
View on GitHub
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆17Jan 8, 2025Updated last year
SLIT-AI / WRPO
View on GitHub
[ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion
☆14Mar 17, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
link-zju / ORL-Auditor
View on GitHub
☆12Sep 8, 2023Updated 2 years ago
heliossun / SQ-LLaVA
View on GitHub
Visual self-questioning for large vision-language assistant.
☆44Jul 23, 2025Updated last year
salesforce / GlueGen
View on GitHub
☆65Jun 16, 2025Updated last year
Shentao-YANG / Dense_Reward_T2I
View on GitHub
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
☆39May 9, 2024Updated 2 years ago
fjxmlzn / private-evolution-papers
View on GitHub
The collection of papers about Private Evolution
☆18Jul 20, 2026Updated last week
bigcode-project / bigcodebench-annotation
View on GitHub
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
☆26Aug 8, 2024Updated last year
MATPOWER / mpng
View on GitHub
MPNG
☆11Sep 13, 2023Updated 2 years ago
tianjunz / TEMPERA
View on GitHub
☆46Apr 10, 2023Updated 3 years ago
xi-mao / alexnet-cifar-10
View on GitHub
这是alexnet基于cifar-10数据集的代码，训练后在测试集上的accuracy为74%
☆10Mar 14, 2018Updated 8 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
TokyoRobotics / torobo_isaac_lab
View on GitHub
Reinforcement learning examples for Torobo based on IsaacLab
☆37Dec 3, 2024Updated last year
uncbiag / UniLMMV
View on GitHub
☆11Mar 25, 2024Updated 2 years ago
GX-XinGao / GRA
View on GitHub
The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"
☆34Jun 13, 2025Updated last year
nuwuxian / RL_adv_valuediff
View on GitHub
☆16Mar 24, 2023Updated 3 years ago
Henrygwb / edge
View on GitHub
☆21Jan 17, 2022Updated 4 years ago
wyzjack / AdaM3
View on GitHub
[ICDM 2023] Momentum is All You Need for Data-Driven Adaptive Optimization
☆26Mar 30, 2024Updated 2 years ago
NaCl-1374 / Trajectory-Optimization-for-Legged-Robots
View on GitHub
Trajectory Optimization for Legged Robots by matlab and Casadi
☆12Apr 8, 2024Updated 2 years ago
Roblox / SmoothCache
View on GitHub
Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.
☆48Jul 17, 2025Updated last year
SunnierLee / DP-ImaGen
View on GitHub
[USENIX Security 2024] PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretrainin…
☆24Nov 10, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
sabagian / isaaclab-sac
View on GitHub
☆16Jun 2, 2026Updated last month
ZQ-Struggle / AdvDoor
View on GitHub
AdvDoor: Adversarial Backdoor Attack of Deep Learning System
☆32Nov 5, 2024Updated last year
MingSun-Tse / Regularization-Pruning
View on GitHub
[ICLR'21] Neural Pruning via Growing Regularization (PyTorch)
☆82Jul 15, 2021Updated 5 years ago
shenxiaocam / CDNE
View on GitHub
Network Together: Node Classification via Cross-Network Deep Network Embedding
☆11May 5, 2021Updated 5 years ago
uw-nsl / safechain
View on GitHub
[ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
☆30Apr 2, 2025Updated last year
salesforce / HIVE
View on GitHub
☆121Jun 2, 2026Updated last month
Panda-Peter / visda2019-semisupervised
View on GitHub
Source code of our submission (Rank 2) for Semi-Supervised Domain Adaptation task in VisDA-2019
☆16Oct 10, 2019Updated 6 years ago
Seeed-Studio / ardupy-aip
View on GitHub
ArduPy Integrated Platform is a utility to develop ArduPy and interact witch ArduPy board. It enables users to quickly get started with …
☆13Jan 24, 2022Updated 4 years ago
visionjo / pykinship
View on GitHub
SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summ…
☆17Mar 13, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
2019ChenGong / Offline_RL_Poisoner
View on GitHub
[S&P 2024] Replication Package for "Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets".
☆33Dec 30, 2024Updated last year
jump-dev / MosekTools.jl
View on GitHub
A MathOptInterface.jl interface to the MOSEK solver
☆28Apr 26, 2026Updated 3 months ago
debadeepta / vnla
View on GitHub
Code accompanying the CVPR 2019 paper: https://arxiv.org/abs/1812.04155
☆61Mar 30, 2022Updated 4 years ago
UestcJay / TensorFlow2-GAN
View on GitHub
tf2 implementations of gan.
☆42Feb 18, 2022Updated 4 years ago
lanl-ansi / GasPowerModels.jl
View on GitHub
Julia packages for joint optimization of natural gas and power transmission networks
☆29Feb 2, 2024Updated 2 years ago
AI-secure / Shapley-Study
View on GitHub
[CVPR 2021] Scalability vs. Utility: Do We Have to Sacriﬁce One for the Other in Data Importance Quantiﬁcation?
☆34Dec 26, 2020Updated 5 years ago
nikhil-dce / adversarial-network-for-conditioned-feature-generation
View on GitHub
Zero-Shot Learning using GAN
☆15Dec 1, 2017Updated 8 years ago