KbsdJames / omni-math-ruleLinks

The rule-based evaluation subset and code implementation of Omni-MATH

☆23

Alternatives and similar repositories for omni-math-rule

Users that are interested in omni-math-rule are comparing it to the libraries listed below

Sorting:

GAIR-NLP / self-improvement-reversal
☆13Updated last year
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆88Updated 10 months ago
GAIR-NLP / weak-to-strong-reasoning
☆58Updated last year
GAIR-NLP / BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
☆34Updated last year
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆73Updated 2 weeks ago
rookie-joe / AutoPSV
☆50Updated 11 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆114Updated 5 months ago
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆81Updated 8 months ago
hkust-nlp / Laser
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆55Updated 5 months ago
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆64Updated 3 months ago
zzli2022 / TLDR
Code for Research Project TLDR
☆23Updated 2 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 3 months ago
TianHongZXY / RLVR-Decomposed
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆113Updated 2 months ago
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆115Updated 10 months ago
WangHanLinHenry / SPA-RL-Agent
Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"
☆46Updated last month
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆63Updated last year
WeiminXiong / IPR
Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)
☆62Updated last year
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆37Updated last year
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 9 months ago
hkust-nlp / RL-Verifier-Robustness
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
☆23Updated 2 weeks ago
icip-cas / Verifier-Engineering
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆61Updated 10 months ago
test-time-interaction / TTI
☆63Updated 4 months ago
hanxuhu / SeqIns
The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…
☆29Updated 11 months ago
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆174Updated 5 months ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆67Updated last year
FreedomIntelligence / OVM
☆69Updated last year
sail-sg / ActivePRM
☆19Updated 6 months ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆177Updated 3 months ago
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆69Updated 6 months ago
koalazf99 / nanoverl
Collections of RLxLM experiments using minimal codes
☆14Updated 8 months ago