☆52Mar 5, 2025Updated last year
Alternatives and similar repositories for MARIO_EVAL
Users that are interested in MARIO_EVAL are comparing it to the libraries listed below
Sorting:
- ☆342Jun 5, 2025Updated 9 months ago
- ☆29May 8, 2024Updated last year
- [ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large …☆24May 29, 2024Updated last year
- ☆30Dec 27, 2024Updated last year
- Evaluation utilities based on SymPy.☆21Dec 12, 2024Updated last year
- [SIGIR24] Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval☆18Feb 29, 2024Updated 2 years ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆187May 20, 2025Updated 10 months ago
- [MathCoder, MathCoder-VL] Family of LLMs/LMMs for mathematical reasoning.☆336Oct 18, 2025Updated 5 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆273Apr 26, 2024Updated last year
- The OlymMATH dataset☆24Jun 1, 2025Updated 9 months ago
- Official repository for ALT (ALignment with Textual feedback).☆10Jul 25, 2024Updated last year
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆121Dec 10, 2024Updated last year
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆54Oct 1, 2024Updated last year
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated last year
- ☆1,111Jan 10, 2026Updated 2 months ago
- ☆85Jul 10, 2024Updated last year
- Improving word mover’s distance by leveraging self-attention matrix (Published in EMNLP 2023 Findings)☆10Mar 10, 2026Updated last week
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆94Nov 8, 2025Updated 4 months ago
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆31Dec 6, 2023Updated 2 years ago
- Debiasing Through Data Attribution☆12May 23, 2024Updated last year
- ☆13Sep 27, 2022Updated 3 years ago
- ☆31Jun 24, 2024Updated last year
- RL Scaling and Test-Time Scaling (ICML'25)☆114Jan 23, 2025Updated last year
- Kaggle AIMO2 solution with token-efficient reasoning LLM recipes☆44Aug 7, 2025Updated 7 months ago
- ☆12Feb 16, 2024Updated 2 years ago
- Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" [ICLR 2024]☆383Aug 25, 2024Updated last year
- 多语言降噪预训练模型MBart的中文生成任务☆12May 27, 2021Updated 4 years ago
- A flexible and efficient training framework for large-scale alignment tasks☆450Oct 23, 2025Updated 4 months ago
- Multi-turn RL framework for aligning models to be tutors instead of answerers. EMNLP 2025 Oral☆33Dec 11, 2025Updated 3 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆32Jun 16, 2024Updated last year
- Official Repo of "CIBench: Evaluation of LLMs as Code Interpreter "☆14Jul 19, 2024Updated last year
- ☆10May 9, 2021Updated 4 years ago
- The code implementation for TTCS: Test-Time Curriculum Synthesis for Self-Evolving.☆39Mar 8, 2026Updated last week
- PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration☆42Jan 7, 2026Updated 2 months ago
- Paper: Relational Sentence Embedding for Flexible Semantic Matching☆12May 22, 2024Updated last year
- Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'☆27May 16, 2025Updated 10 months ago
- A platform for Interactive AI-assisted Hypothesis Generation [ACL 2025]☆28Aug 18, 2025Updated 7 months ago
- Code for EMNLP 2021 Paper "Recall and Learn: A Memory-augmented Solver for Math Word Problems".☆16Oct 20, 2022Updated 3 years ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆116Feb 9, 2024Updated 2 years ago