☆52Mar 5, 2025Updated 11 months ago
Alternatives and similar repositories for MARIO_EVAL
Users that are interested in MARIO_EVAL are comparing it to the libraries listed below
Sorting:
- ☆342Jun 5, 2025Updated 8 months ago
- ☆29May 8, 2024Updated last year
- ☆30Dec 27, 2024Updated last year
- [ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large …☆24May 29, 2024Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆26Dec 23, 2024Updated last year
- Official repository for ALT (ALignment with Textual feedback).☆10Jul 25, 2024Updated last year
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- [MathCoder, MathCoder-VL] Family of LLMs/LMMs for mathematical reasoning.☆335Oct 18, 2025Updated 4 months ago
- [SIGIR24] Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval☆18Feb 29, 2024Updated 2 years ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆72Feb 25, 2025Updated last year
- ☆84Jul 10, 2024Updated last year
- The OlymMATH dataset☆23Jun 1, 2025Updated 8 months ago
- A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low a…☆26Feb 14, 2025Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Oct 27, 2024Updated last year
- ☆26Feb 11, 2025Updated last year
- ☆1,098Jan 10, 2026Updated last month
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆116Feb 9, 2024Updated 2 years ago
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆111May 22, 2025Updated 9 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆120Dec 10, 2024Updated last year
- ☆31Jun 24, 2024Updated last year
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆31Dec 6, 2023Updated 2 years ago
- ☆37May 31, 2024Updated last year
- ☆35Sep 1, 2022Updated 3 years ago
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆38Jan 7, 2025Updated last year
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆94Nov 8, 2025Updated 3 months ago
- [NeurIPS 2024] MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems☆92Jul 24, 2024Updated last year
- [ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist☆35Oct 23, 2024Updated last year
- [NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms"☆23Oct 23, 2025Updated 4 months ago
- ☆12Jul 4, 2024Updated last year
- Debiasing Through Data Attribution☆12May 23, 2024Updated last year
- The official implementation of "Self-play LLM Theorem Provers with Iterative Conjecturing and Proving"☆117Mar 28, 2025Updated 10 months ago
- End-to-end Speech Translation☆35Apr 12, 2021Updated 4 years ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆692Jan 20, 2025Updated last year
- This is the official repository for all the code of TheoremLlama☆47Aug 4, 2025Updated 6 months ago
- [AAAI 2025] Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems☆12May 5, 2025Updated 9 months ago
- JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning☆10Nov 3, 2024Updated last year
- Multi-turn RL framework for aligning models to be tutors instead of answerers. EMNLP 2025 Oral☆31Dec 11, 2025Updated 2 months ago
- Image inpainting using Markov random field modelling☆11Jun 30, 2021Updated 4 years ago
- Code for ACL 2024 findings paper "wav2vec-S: Adapting Pre-trained Speech Models for Streaming"☆10Apr 20, 2025Updated 10 months ago