A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low accuracy in solving these problems.
☆28Feb 14, 2025Updated last year
Alternatives and similar repositories for HARDMath
Users that are interested in HARDMath are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 8 months ago
- ☆84Jan 25, 2025Updated last year
- ☆23Jan 31, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Kaggle AIMO2 solution with token-efficient reasoning LLM recipes☆51Aug 7, 2025Updated 8 months ago
- [npj Digital Medicine'25] Continuous sleep depth index annotation with deep learning yields novel digital biomarkers for sleep health☆16Apr 13, 2025Updated last year
- Repository for the paper: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning☆18Feb 21, 2025Updated last year
- ☆14Oct 21, 2024Updated last year
- ☆12Jun 5, 2024Updated last year
- ☆60Jun 23, 2025Updated 10 months ago
- ☆11Jul 15, 2020Updated 5 years ago
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆38Jul 25, 2024Updated last year
- ☆11Jan 2, 2022Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…☆191Jun 8, 2025Updated 10 months ago
- A curated list of cutting-edge research papers and resources on Long Chain-of-Thought (CoT) Reasoning with Tools.☆47Dec 17, 2025Updated 4 months ago
- [AAAI 2025] Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems☆13May 5, 2025Updated 11 months ago
- ☆19Jun 4, 2024Updated last year
- ☆15Apr 16, 2025Updated last year
- !!!!(DEMO)!!!! !!! CHECK OUT THE NEW VERSİON !!! Counting Close People with Yolov7☆13Sep 14, 2022Updated 3 years ago
- A curated list of PhD, RA, and Intern openings in Computer Science (CS), Electrical & Computer Engineering (ECE), and Artificial Intellig…☆21Sep 1, 2025Updated 8 months ago
- A C project template with support for CMake and Unity test framework☆11Jun 12, 2018Updated 7 years ago
- ☆15Oct 9, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- As defined in Lubotzky, Philips and Sarnak☆10Oct 25, 2022Updated 3 years ago
- DGCIT: Double Generative Adversarial Networks for Conditional Independence Testing☆11Nov 22, 2023Updated 2 years ago
- ☆80Nov 19, 2024Updated last year
- Code and data to support Bamman et al. (2020), "A Dataset of Literary Coreference" (LREC)☆10Dec 8, 2022Updated 3 years ago
- Codes for coreference-aware machine reading comprehension☆13Mar 13, 2022Updated 4 years ago
- ☆14Jul 25, 2024Updated last year
- jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2☆20Aug 15, 2025Updated 8 months ago
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆22Feb 16, 2025Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆116Feb 9, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆113May 22, 2025Updated 11 months ago
- 🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)☆12Updated this week
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated last year
- Mix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math…☆73Jul 27, 2024Updated last year
- MATLAB Adaptor packages for KINOVA® KORTEX™ robotic arms☆15Sep 29, 2025Updated 7 months ago
- This repository provides simulator codes for predicting and tracking popular discussion threads on Reddit☆21Sep 10, 2016Updated 9 years ago
- [EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners☆27Dec 11, 2024Updated last year