A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low accuracy in solving these problems.
☆29Feb 14, 2025Updated last year
Alternatives and similar repositories for HARDMath
Users that are interested in HARDMath are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- ☆30Dec 27, 2024Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 10 months ago
- The OlymMATH dataset☆25Jun 1, 2025Updated last year
- ☆84Jan 25, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆74Feb 25, 2025Updated last year
- ☆22Jan 31, 2025Updated last year
- Kaggle AIMO2 solution with token-efficient reasoning LLM recipes☆50Aug 7, 2025Updated 10 months ago
- [npj Digital Medicine'25] Continuous sleep depth index annotation with deep learning yields novel digital biomarkers for sleep health☆16Apr 13, 2025Updated last year
- Repository for the paper: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning☆18Feb 21, 2025Updated last year
- ☆12Jun 5, 2024Updated 2 years ago
- ☆11Jul 15, 2020Updated 5 years ago
- Modern development with Python in 2024☆12Jun 22, 2026Updated last week
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆38Jul 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆11Jan 2, 2022Updated 4 years ago
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…☆196Jun 8, 2025Updated last year
- A curated list of cutting-edge research papers and resources on Long Chain-of-Thought (CoT) Reasoning with Tools.☆47Dec 17, 2025Updated 6 months ago
- ☆73Jun 23, 2025Updated last year
- ☆19Jun 4, 2024Updated 2 years ago
- ☆12May 23, 2022Updated 4 years ago
- Multimodal Compact Bilinear Pooling class in Python☆12Sep 17, 2019Updated 6 years ago
- ☆52Oct 5, 2020Updated 5 years ago
- As defined in Lubotzky, Philips and Sarnak☆10Oct 25, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆80Nov 19, 2024Updated last year
- LaTeX Beamer template crafted for University of Illinois Chicago☆12Dec 7, 2024Updated last year
- Code and data to support Bamman et al. (2020), "A Dataset of Literary Coreference" (LREC)☆10Dec 8, 2022Updated 3 years ago
- Codes for coreference-aware machine reading comprehension☆13Mar 13, 2022Updated 4 years ago
- ☆15Jul 1, 2020Updated 6 years ago
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆23Feb 16, 2025Updated last year
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆115May 22, 2025Updated last year
- Top Picks for Data Science Self-Study: From Newbies to Pros!☆11Apr 2, 2024Updated 2 years ago
- Mix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math…☆73Jul 27, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Benchmarking Benchmark Leakage in Large Language Models☆61May 20, 2024Updated 2 years ago
- [EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners☆27Dec 11, 2024Updated last year
- This repository provides simulator codes for predicting and tracking popular discussion threads on Reddit☆21Sep 10, 2016Updated 9 years ago
- Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models☆15Feb 20, 2019Updated 7 years ago
- Elastic Workplace Search Official Python Client☆10Aug 8, 2024Updated last year
- Dataset for AAAI paper "Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts"☆11Nov 18, 2022Updated 3 years ago
- 蚂蚁金融自然语言处理竞赛。☆10Sep 3, 2018Updated 7 years ago