MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs.
☆87Nov 15, 2024Updated last year
Alternatives and similar repositories for MathEval
Users that are interested in MathEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆22Sep 10, 2021Updated 4 years ago
- The official repository of the Omni-MATH benchmark.☆94Dec 22, 2024Updated last year
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆277Apr 26, 2024Updated 2 years ago
- PULSE-EVAL☆24Jan 12, 2024Updated 2 years ago
- Mix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math…☆73Jul 27, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆106Mar 6, 2025Updated last year
- ☆30Dec 27, 2024Updated last year
- Code for ICML21 paper "Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation"☆12Feb 8, 2023Updated 3 years ago
- Code for the ACL2022 paper "Synthetic Question Value Estimation for Domain Adaptation of Question Answering"☆18Mar 21, 2022Updated 4 years ago
- TSQA: Tabular Scenario Based Question Answering (AAAI 2021)☆18Dec 17, 2020Updated 5 years ago
- An Experiment on Dynamic NTK Scaling RoPE☆65Nov 26, 2023Updated 2 years ago
- ☆19Jul 31, 2025Updated 11 months ago
- ☆10Dec 28, 2023Updated 2 years ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 8 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset☆161Apr 23, 2024Updated 2 years ago
- Graph4Tree is a simple example code for our EMNLP'20 Findings paper idea.☆26Nov 18, 2020Updated 5 years ago
- We systematically studied the influencing factors when LLM generates benchmarks,By using our code, you can generate high-quality QA datas…☆20May 20, 2025Updated last year
- ☆14May 12, 2025Updated last year
- A pytorch implement of "Application of Deep Self-Attention in Knowledge Tracing"☆10May 21, 2021Updated 5 years ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆87Dec 14, 2023Updated 2 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆66Jul 8, 2024Updated last year
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆258Oct 30, 2024Updated last year
- pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models☆410Jun 4, 2026Updated 3 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆54Sep 3, 2024Updated last year
- Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest Cost☆39Jun 21, 2023Updated 3 years ago
- [ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks☆33Sep 20, 2024Updated last year
- Source code of Venus-MAXWELL: Efficient Learning of Protein-Mutation Stability Landscapes using Protein Language Models☆25Jun 3, 2025Updated last year
- ☆19Oct 13, 2025Updated 8 months ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆42Mar 8, 2023Updated 3 years ago
- Codebase for EnterpriseOps-Gym from ServiceNow☆99Jun 3, 2026Updated 3 weeks ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- A custom line wrap layout ,support set max lines.(自定义流式布局,支持设置最大行数)☆10Apr 13, 2018Updated 8 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [MathCoder, MathCoder-VL] Family of LLMs/LMMs for mathematical reasoning.☆339Oct 18, 2025Updated 8 months ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"☆49Jun 2, 2026Updated 3 weeks ago
- The official implementation of Latte: Latent Diffusion Transformer for Video Generation.☆34Feb 26, 2025Updated last year
- ☆16Feb 28, 2023Updated 3 years ago
- ☆21Apr 16, 2025Updated last year
- Code and data to support Bamman et al. (2020), "A Dataset of Literary Coreference" (LREC)☆10Dec 8, 2022Updated 3 years ago