☆52Mar 5, 2025Updated last year
Alternatives and similar repositories for MARIO_EVAL
Users that are interested in MARIO_EVAL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large …☆25May 29, 2024Updated last year
- ☆30Dec 27, 2024Updated last year
- Evaluation utilities based on SymPy.☆22Dec 12, 2024Updated last year
- ☆27Aug 31, 2022Updated 3 years ago
- [SIGIR24] Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval☆18Feb 29, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A LLaMA1/LLaMA12 Megatron implement.☆28Dec 13, 2023Updated 2 years ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆189May 20, 2025Updated 10 months ago
- [MathCoder, MathCoder-VL] Family of LLMs/LMMs for mathematical reasoning.☆338Oct 18, 2025Updated 5 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Oct 27, 2024Updated last year
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆73Feb 25, 2025Updated last year
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆274Apr 26, 2024Updated last year
- The OlymMATH dataset☆24Jun 1, 2025Updated 10 months ago
- Official repository for ALT (ALignment with Textual feedback).☆10Jul 25, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆112May 22, 2025Updated 10 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆121Dec 10, 2024Updated last year
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆41Jan 7, 2025Updated last year
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated last year
- ☆85Jul 10, 2024Updated last year
- A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low a…☆27Feb 14, 2025Updated last year
- Improving word mover’s distance by leveraging self-attention matrix (Published in EMNLP 2023 Findings)☆10Mar 10, 2026Updated 3 weeks ago
- ☆12Apr 25, 2024Updated last year
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆95Nov 8, 2025Updated 5 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Debiasing Through Data Attribution☆13May 23, 2024Updated last year
- ☆13Sep 27, 2022Updated 3 years ago
- ☆32Jun 24, 2024Updated last year
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆32Dec 6, 2023Updated 2 years ago
- Kaggle AIMO2 solution with token-efficient reasoning LLM recipes☆46Aug 7, 2025Updated 8 months ago
- ☆12Feb 16, 2024Updated 2 years ago
- CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift (FSE 2025)☆13May 19, 2025Updated 10 months ago
- Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" [ICLR 2024]☆384Aug 25, 2024Updated last year
- ☆35Sep 1, 2022Updated 3 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- End-to-end Speech Translation with Stacked Acoustic-and-Textual Encoding☆26Aug 12, 2021Updated 4 years ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆32Jun 16, 2024Updated last year
- Official Repo of "CIBench: Evaluation of LLMs as Code Interpreter "☆14Jul 19, 2024Updated last year
- PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration☆43Jan 7, 2026Updated 3 months ago
- Code for ACL 2024 findings paper "wav2vec-S: Adapting Pre-trained Speech Models for Streaming"☆12Apr 20, 2025Updated 11 months ago
- Paper: Relational Sentence Embedding for Flexible Semantic Matching☆12May 22, 2024Updated last year
- The code implementation for TTCS: Test-Time Curriculum Synthesis for Self-Evolving.☆41Mar 8, 2026Updated last month