hkust-nlp / RL-Verifier-PitfallsLinks

Pitfalls of Rule- and Model-based Verifiers: A Case Study on Mathematical Reasoning.

☆23

Alternatives and similar repositories for RL-Verifier-Pitfalls

Users that are interested in RL-Verifier-Pitfalls are comparing it to the libraries listed below

Sorting:

sail-sg / ActivePRM
☆18Updated 5 months ago
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆22Updated 8 months ago
GAIR-NLP / BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
☆34Updated last year
jinzhuoran / RAG-RewardBench
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆16Updated 9 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆67Updated 2 months ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆59Updated 9 months ago
GAIR-NLP / self-improvement-reversal
☆13Updated last year
rookie-joe / AutoPSV
☆49Updated 10 months ago
Yifan-Song793 / GoodBadGreedy
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
☆30Updated last year
hkust-nlp / Laser
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆53Updated 3 months ago
kiaia / GIRAFFE
Extending context length of visual language models
☆12Updated 9 months ago
GAIR-NLP / weak-to-strong-reasoning
☆59Updated last year
TingchenFu / MathIF
instruction-following benchmark for large reasoning models
☆40Updated last month
hanxuhu / SeqIns
The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…
☆29Updated 9 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆69Updated 9 months ago
thu-coai / BARREL
☆16Updated 3 months ago
ZhentingWang / DUMP
☆26Updated 4 months ago
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆61Updated 2 months ago
test-time-interaction / TTI
☆60Updated 3 months ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 5 months ago
rhyang2021 / ARIA
Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".
☆22Updated last month
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆37Updated 11 months ago
hkust-nlp / GUIMid
☆21Updated 4 months ago
WangHanLinHenry / SPA-RL-Agent
Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"
☆42Updated last week
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆63Updated last year
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆61Updated last year
YiCheng98 / IntegrativeDecoding
Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"
☆30Updated 5 months ago
UCSB-NLP-Chang / ThinkPrune
☆43Updated 5 months ago
bobxwu / learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…
☆56Updated 3 months ago