princeton-pli / what-makes-good-rmView external linksLinks
[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆42Sep 18, 2025Updated 4 months ago
Alternatives and similar repositories for what-makes-good-rm
Users that are interested in what-makes-good-rm are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"☆30Jan 10, 2026Updated last month
- ☆19Mar 25, 2025Updated 10 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- ☆20Aug 30, 2025Updated 5 months ago
- ☆15Jan 21, 2026Updated 3 weeks ago
- ☆17Aug 1, 2025Updated 6 months ago
- ☆70Jun 18, 2025Updated 8 months ago
- ☆27Dec 14, 2023Updated 2 years ago
- LCA-on-the-line (ICML 2024 Oral)☆13Feb 13, 2025Updated last year
- ☆16Dec 11, 2025Updated 2 months ago
- Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models☆21Jan 6, 2026Updated last month
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆14Jun 6, 2025Updated 8 months ago
- ☆13Jan 22, 2025Updated last year
- [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Langua…☆13Nov 11, 2024Updated last year
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆149Oct 10, 2025Updated 4 months ago
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆22Dec 8, 2024Updated last year
- 🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"☆24Dec 14, 2025Updated 2 months ago
- ☆15Nov 4, 2021Updated 4 years ago
- CS194-196 Course Project☆14Feb 20, 2025Updated 11 months ago
- ☆23Jul 20, 2025Updated 6 months ago
- Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes☆21Jun 15, 2025Updated 8 months ago
- [ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?☆20Oct 20, 2025Updated 3 months ago
- ☆33Jan 7, 2025Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆41Jan 29, 2026Updated 2 weeks ago
- ☆26Aug 21, 2025Updated 5 months ago
- [ACL 2025 Findings] Text2World: Benchmarking Large Language Models for Symbolic World Model Generation☆27Feb 25, 2025Updated 11 months ago
- ☆20Apr 16, 2025Updated 10 months ago
- Convert CVXPY expressions to PyTorch expressions☆19Jul 8, 2025Updated 7 months ago
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Mar 31, 2025Updated 10 months ago
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆26Dec 14, 2025Updated 2 months ago
- Code and data for paper "Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?". (ACL 2025 Main)☆20Jun 18, 2025Updated 8 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆19Feb 14, 2025Updated last year
- ☆41Jan 4, 2026Updated last month
- [EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality☆19Mar 4, 2025Updated 11 months ago
- ☆18Mar 6, 2024Updated last year
- Official repository of DialSim☆28Oct 31, 2025Updated 3 months ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Apr 21, 2025Updated 9 months ago
- This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.☆17Sep 13, 2024Updated last year
- R3: Robust Rubric-Agnostic Reward Models☆20Jul 12, 2025Updated 7 months ago