MARIO-Math-Reasoning/MARIO_EVAL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MARIO-Math-Reasoning/MARIO_EVAL)

MARIO-Math-Reasoning / MARIO_EVAL

☆52

Alternatives and similar repositories for MARIO_EVAL

Users that are interested in MARIO_EVAL are comparing it to the libraries listed below

Sorting:

MARIO-Math-Reasoning / Super_MARIO
View on GitHub
☆342Jun 5, 2025Updated 8 months ago
MARIO-Math-Reasoning / MARIO
View on GitHub
☆29May 8, 2024Updated last year
ChengpengLi1003 / DotaMath
View on GitHub
☆30Dec 27, 2024Updated last year
conceptmath / conceptmath
View on GitHub
[ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large …
☆24May 29, 2024Updated last year
KbsdJames / omni-math-rule
View on GitHub
The rule-based evaluation subset and code implementation of Omni-MATH
☆26Dec 23, 2024Updated last year
sauc-abadal / ALT
View on GitHub
Official repository for ALT (ALignment with Textual feedback).
☆10Jul 25, 2024Updated last year
KbsdJames / Omni-MATH
View on GitHub
The official repository of the Omni-MATH benchmark.
☆93Dec 22, 2024Updated last year
mathllm / MathCoder
View on GitHub
[MathCoder, MathCoder-VL] Family of LLMs/LMMs for mathematical reasoning.
☆335Oct 18, 2025Updated 4 months ago
ma787639046 / bowdpr
View on GitHub
[SIGIR24] Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval
☆18Feb 29, 2024Updated 2 years ago
SynthLabsAI / big-math
View on GitHub
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆72Feb 25, 2025Updated last year
AIMO-CMU-MATH / CMU_MATH-AIMO
View on GitHub
☆84Jul 10, 2024Updated last year
RUCAIBox / OlymMATH
View on GitHub
The OlymMATH dataset
☆23Jun 1, 2025Updated 8 months ago
sarahmart / HARDMath
View on GitHub
A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low a…
☆26Feb 14, 2025Updated last year
TIGER-AI-Lab / MAmmoTH2
View on GitHub
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆149Oct 27, 2024Updated last year
SeungyounShin / minimal-r1
View on GitHub
☆26Feb 11, 2025Updated last year
huggingface / Math-Verify
View on GitHub
☆1,098Jan 10, 2026Updated last month
WooooDyy / LLM-Reverse-Curriculum-RL
View on GitHub
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆116Feb 9, 2024Updated 2 years ago
open-compass / MathBench
View on GitHub
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
☆111May 22, 2025Updated 9 months ago
hkust-nlp / dart-math
View on GitHub
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆120Dec 10, 2024Updated last year
MetaCopilot / dseval
View on GitHub
☆31Jun 24, 2024Updated last year
whyNLP / Conic10K
View on GitHub
Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.
☆31Dec 6, 2023Updated 2 years ago
apple / ml-entity-deduction-arena
View on GitHub
☆37May 31, 2024Updated last year
danliu2 / caat
View on GitHub
☆35Sep 1, 2022Updated 3 years ago
OpenLMLab / GAOKAO-Bench-Updates
View on GitHub
GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.
☆38Jan 7, 2025Updated last year
RyanLiu112 / GenPRM
View on GitHub
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆94Nov 8, 2025Updated 3 months ago
bin123apple / MACM
View on GitHub
[NeurIPS 2024] MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
☆92Jul 24, 2024Updated last year
PremiLab-Math / MathCheck
View on GitHub
[ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
☆35Oct 23, 2024Updated last year
Baran-phys / Tropical-Attention
View on GitHub
[NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms"
☆23Oct 23, 2025Updated 4 months ago
HITsz-TMG / ICL-State-Vector
View on GitHub
☆12Jul 4, 2024Updated last year
MadryLab / D3M
View on GitHub
Debiasing Through Data Attribution
☆12May 23, 2024Updated last year
kfdong / STP
View on GitHub
The official implementation of "Self-play LLM Theorem Provers with Iterative Conjecturing and Proving"
☆117Mar 28, 2025Updated 10 months ago
dqqcasia / st
View on GitHub
End-to-end Speech Translation
☆35Apr 12, 2021Updated 4 years ago
THUDM / ReST-MCTS
View on GitHub
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
☆692Jan 20, 2025Updated last year
RickySkywalker / TheoremLlama
View on GitHub
This is the official repository for all the code of TheoremLlama
☆47Aug 4, 2025Updated 6 months ago
JunyiYe / CreativeMath
View on GitHub
[AAAI 2025] Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems
☆12May 5, 2025Updated 9 months ago
gao-xiao-bai / JsonTuning
View on GitHub
JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
☆10Nov 3, 2024Updated last year
eth-lre / PedagogicalRL
View on GitHub
Multi-turn RL framework for aligning models to be tutors instead of answerers. EMNLP 2025 Oral
☆31Dec 11, 2025Updated 2 months ago
nimpy / inpynting
View on GitHub
Image inpainting using Markov random field modelling
☆11Jun 30, 2021Updated 4 years ago
biaofuxmu / wav2vec-S
View on GitHub
Code for ACL 2024 findings paper "wav2vec-S: Adapting Pre-trained Speech Models for Streaming"
☆10Apr 20, 2025Updated 10 months ago