A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low accuracy in solving these problems.
☆26Feb 14, 2025Updated last year
Alternatives and similar repositories for HARDMath
Users that are interested in HARDMath are comparing it to the libraries listed below
Sorting:
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- ☆30Dec 27, 2024Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆13Aug 8, 2025Updated 7 months ago
- The OlymMATH dataset☆24Jun 1, 2025Updated 9 months ago
- ☆85Jan 25, 2025Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆73Feb 25, 2025Updated last year
- ☆23Jan 31, 2025Updated last year
- [npj Digital Medicine'25] Continuous sleep depth index annotation with deep learning yields novel digital biomarkers for sleep health☆16Apr 13, 2025Updated 11 months ago
- Repository for the paper: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning☆18Feb 21, 2025Updated last year
- ☆14Oct 21, 2024Updated last year
- ☆56Jun 23, 2025Updated 8 months ago
- Modern development with Python in 2024☆12Updated this week
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…☆187Jun 8, 2025Updated 9 months ago
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆38Jul 25, 2024Updated last year
- A curated list of cutting-edge research papers and resources on Long Chain-of-Thought (CoT) Reasoning with Tools.☆46Dec 17, 2025Updated 3 months ago
- ☆19Jun 4, 2024Updated last year
- LaTeX Beamer template crafted for University of Illinois Chicago☆11Dec 7, 2024Updated last year
- ☆14Apr 16, 2025Updated 11 months ago
- !!!!(DEMO)!!!! !!! CHECK OUT THE NEW VERSİON !!! Counting Close People with Yolov7☆13Sep 14, 2022Updated 3 years ago
- A C project template with support for CMake and Unity test framework☆11Jun 12, 2018Updated 7 years ago
- jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2☆19Aug 15, 2025Updated 7 months ago
- ☆15Oct 9, 2022Updated 3 years ago
- ☆79Nov 19, 2024Updated last year
- [IJCAI 2023] The official repo of paper 'Automatic Truss Design with Reinforcement Learning'☆19Jun 19, 2023Updated 2 years ago
- Code and data to support Bamman et al. (2020), "A Dataset of Literary Coreference" (LREC)☆10Dec 8, 2022Updated 3 years ago
- Codes for coreference-aware machine reading comprehension☆13Mar 13, 2022Updated 4 years ago
- ☆15Jul 1, 2020Updated 5 years ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆116Feb 9, 2024Updated 2 years ago
- Logical Operations On Puzzles: Simple Iterative Reasoning Tests for LLMs first through wordgrids☆18Feb 19, 2025Updated last year
- 🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)☆12Updated this week
- ☆12May 13, 2021Updated 4 years ago
- Mix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math…☆74Jul 27, 2024Updated last year
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated last year
- This repository provides simulator codes for predicting and tracking popular discussion threads on Reddit☆20Sep 10, 2016Updated 9 years ago
- [EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners☆26Dec 11, 2024Updated last year
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning (EMNLP 2025)☆58Oct 10, 2025Updated 5 months ago
- Math 228A 2019 Fall☆16Dec 4, 2019Updated 6 years ago
- Official codebase for the ACL 2025 Findings paper: Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval.☆20Jul 26, 2025Updated 7 months ago