YangRui2015/Generalizable-Reward-Model

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YangRui2015/Generalizable-Reward-Model)

YangRui2015 / Generalizable-Reward-Model

Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"

☆47

Alternatives and similar repositories for Generalizable-Reward-Model

Users that are interested in Generalizable-Reward-Model are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ASTRAL-Group / BDC-mitigation-assessment
View on GitHub
[ICML 2025] Official implementation for "The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for L…
☆15May 23, 2025Updated last year
BofangJia / SDM-Policy
View on GitHub
Score and Distribution Matching Policy: Advanced accelerated Visuomotor Policies via matched distillation
☆11May 9, 2025Updated last year
THU-KEG / RM-Bench
View on GitHub
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆84Jul 18, 2025Updated last year
dmksjfl / SEABO
View on GitHub
Official code for ICLR 2024 paper, SEABO: A Simple Search-Based Method for Offline Imitation Learning
☆12Jan 19, 2024Updated 2 years ago
tmscde / cuda-ts
View on GitHub
NVIDIA CUDA™ bindings exposed in TypeScript.
☆13Jan 3, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
bond005 / runne_contrastive_ner
View on GitHub
This project is concerned with my participating in the RuNNE competition https://github.com/dialogue-evaluation/RuNNE
☆13Jun 28, 2023Updated 3 years ago
srzer / MOD
View on GitHub
Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".
☆29Oct 30, 2024Updated last year
w-yibo / R1-Compress
View on GitHub
[NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
☆17Jan 24, 2026Updated 5 months ago
asuszko / signal_cross_correlation
View on GitHub
Custom CUDA kernel doing a normalized cross correlation on a batch of signals via pycu_interface.
☆10Apr 11, 2018Updated 8 years ago
hukz18 / Stem-Ob-Code
View on GitHub
Official repo for arxiv paper "Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion I…
☆17Nov 8, 2024Updated last year
yokhon / Words2Wheels
View on GitHub
From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving
☆12Mar 16, 2025Updated last year
frt03 / jax_dt
View on GitHub
Minimal Decision Transformer Implementation written in Jax (Flax).
☆18Aug 8, 2022Updated 3 years ago
bond005 / smart_chunker
View on GitHub
This is a smart chunker for efficient preparing of long document for RAG
☆14Mar 24, 2026Updated 3 months ago
pdfosborne / elsciRL
View on GitHub
The core repository of the elsciRL framework.
☆18Dec 8, 2025Updated 7 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
VITA-Group / Linearity-Grafting
View on GitHub
[ICML 2022] "Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness" by Tianlong Chen*, Huan Zhang*, Zhenyu Zhang, Shiyu…
☆16Jun 22, 2022Updated 4 years ago
chongyi-zheng / value-flows
View on GitHub
The official implementation of Value Flows
☆55Feb 27, 2026Updated 4 months ago
svenpeter42 / LightGBM-CEGB
View on GitHub
Fork of Microsoft/LightGBM to include support for the CEGB (Cost Efficient Gradient Boosting) algorithm. Original repository at https://g…
☆13Jun 30, 2017Updated 9 years ago
wenzhe-li / romi
View on GitHub
Code for NeurIPS 2021 paper "Offline Reinforcement Learning with Reverse Model-based Imagination"
☆20Dec 22, 2021Updated 4 years ago
xiye17 / StructuredRegex
View on GitHub
Data and Code for StructuredRegex.
☆14Nov 16, 2023Updated 2 years ago
OpenHelix-Team / VAMPO
View on GitHub
☆32Jun 7, 2026Updated last month
sharkwyf / critic-guided-decision-transformer
View on GitHub
[AAAI'2024] Critic-Guided Decision Transformer for Offline Reinforcement Learning
☆18May 21, 2025Updated last year
IBM / benchbench
View on GitHub
A package dedicated for running benchmark agreement testing
☆19Sep 18, 2025Updated 10 months ago
google-deepmind / dmc_vision_benchmark
View on GitHub
☆34Jun 21, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
keven980716 / weak-to-strong-deception
View on GitHub
[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"
☆15Jun 21, 2024Updated 2 years ago
ruizheng20 / gpo
View on GitHub
The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".
☆17Jun 20, 2024Updated 2 years ago
hychen-naza / LEAP
View on GitHub
☆17Sep 28, 2023Updated 2 years ago
locuslab / intermediate_robustness
View on GitHub
☆15Dec 7, 2021Updated 4 years ago
ml-jku / L2M
View on GitHub
Learning to Modulate pre-trained Models in RL (Decision Transformer, LoRA, Fine-tuning)
☆61Oct 6, 2024Updated last year
YangRui2015 / Model-basedHER
View on GitHub
Model-based Hindsight Experience Replay
☆10Jun 8, 2022Updated 4 years ago
kangluoyao / VAP_Former
View on GitHub
[MICCAI-2023]Visual-Attribute Prompt Learning for Progressive Mild Cognitive Impairment Prediction
☆15Dec 12, 2023Updated 2 years ago
YihanWang617 / On-ell_p-Robustness-of-Ensemble-Stumps-and-Trees
View on GitHub
Code of On L-p Robustness of Decision Stumps and Trees, ICML 2020
☆10Aug 3, 2020Updated 5 years ago
clvrai / boss
View on GitHub
Code for the paper Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance, accepted to CoRL 2023 as an…
☆35Jul 15, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
shuishida / LangProp
View on GitHub
☆60Mar 22, 2024Updated 2 years ago
LQNew / Continuous_Control_Benchmark
View on GitHub
Benchmark data (i.e., DeepMind Control Suite and MuJoCo) for RL.
☆35Jan 23, 2021Updated 5 years ago
ScalerLab / JudgeBench
View on GitHub
☆128Nov 7, 2024Updated last year
VNN-COMP / vnncomp2023
View on GitHub
Fourth edition of VNN COMP (2023)
☆16Apr 12, 2023Updated 3 years ago
dunnolab / xland-minigrid-datasets
View on GitHub
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning - - — ICLR 2025
☆84Feb 13, 2025Updated last year
tesslerc / TD3-JAX
View on GitHub
A JAX Implementation of the Twin Delayed DDPG Algorithm
☆35Mar 12, 2020Updated 6 years ago
zankner / CLoud
View on GitHub
Critique-out-Loud Reward Models
☆76Oct 18, 2024Updated last year