eth-sri/matharena

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/eth-sri/matharena)

eth-sri / matharena

Evaluation of LLMs on latest math competitions

☆272

Alternatives and similar repositories for matharena

Users that are interested in matharena are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

insait-institute / open-proof-corpus
View on GitHub
This repository contains the code for the paper The Open Proof Corpus: Building a Large-Scale, Human-Validated Dataset of LLM-Generated P…
☆18Aug 4, 2025Updated 11 months ago
GAIR-NLP / AIME-Preview
View on GitHub
☆84Mar 11, 2025Updated last year
zwhe99 / DeepMath
View on GitHub
A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
☆294Sep 25, 2025Updated 9 months ago
huggingface / Math-Verify
View on GitHub
☆1,170Jan 10, 2026Updated 6 months ago
cpldcpu / LRMTokenEconomy
View on GitHub
Measuring Thinking Efficiency in Reasoning Models - Research Repository
☆39Dec 2, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
uq-project / UQ
View on GitHub
UQ: Assessing Language Models on Unsolved Questions
☆30Aug 26, 2025Updated 10 months ago
SynthLabsAI / big-math
View on GitHub
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆74Feb 25, 2025Updated last year
Aider-AI / polyglot-benchmark
View on GitHub
Coding problems used in aider's polyglot benchmark
☆222Dec 22, 2024Updated last year
microsoft / MetaST
View on GitHub
☆26Jul 25, 2023Updated 2 years ago
ZubinGou / math-evaluation-harness
View on GitHub
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆277Apr 26, 2024Updated 2 years ago
seewoo5 / awesome-ai-for-math
View on GitHub
List of awesome works that use AI for mathematical discoveries.
☆70Updated this week
JanTempus / tokenisation_lp
View on GitHub
☆15May 20, 2026Updated 2 months ago
LiveCodeBench / LiveCodeBench
View on GitHub
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
☆911Jul 16, 2025Updated last year
analokmaus / kaggle-aimo2-fast-math-r1
View on GitHub
Kaggle AIMO2 solution with token-efficient reasoning LLM recipes
☆50Aug 7, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
aisa-group / PostTrainBench
View on GitHub
Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours
☆463Updated this week
GavinZhengOI / LiveCodeBench-Pro
View on GitHub
☆176Dec 13, 2025Updated 7 months ago
OpenPipe / rl-experiments
View on GitHub
OpenPipe Reinforcement Learning Experiments
☆34Mar 14, 2025Updated last year
nikhilchandak / answer-matching
View on GitHub
Code for 'Answer Matching Outperforms Multiple Choice for Language Model Evaluation' paper
☆18Jul 4, 2025Updated last year
google-deepmind / superhuman
View on GitHub
☆777Jun 5, 2026Updated last month
bethgelab / sober-reasoning
View on GitHub
A Sober Look at Language Model Reasoning
☆92Nov 18, 2025Updated 8 months ago
roozbeh-mohit / IMO-Steps
View on GitHub
☆31Jul 16, 2025Updated last year
RenzeLou / Muffin
View on GitHub
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
☆16Oct 31, 2024Updated last year
NVIDIA-NeMo / Skills
View on GitHub
A project to improve skills of large language models
☆1,011Updated this week
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,708Updated this week
Newclid / Newclid
View on GitHub
Automatic solver for plane geometry problems.
☆94Feb 24, 2026Updated 4 months ago
zai-org / glm-simple-evals
View on GitHub
GLM-SIMPLE-EVALS: The evaluation repository for the GLM-4.5 series of models by Z.ai.
☆41Oct 17, 2025Updated 9 months ago
ypwang61 / One-Shot-RLVR
View on GitHub
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆444Mar 11, 2026Updated 4 months ago
passing2961 / DialogCC
View on GitHub
Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…
☆13Jun 24, 2024Updated 2 years ago
google-deepmind / alphaevolve_repository_of_problems
View on GitHub
☆228Jul 11, 2026Updated last week
kmill / LeanTeX
View on GitHub
Lean 4 library for pretty printing expressions as LaTeX
☆38Mar 5, 2025Updated last year
faabian / algebraic-combinatorics
View on GitHub
Automatic textbook formalization of Grinberg Algebraic Combinatorics
☆17May 7, 2026Updated 2 months ago
lupantech / ineqmath
View on GitHub
Solving Inequality Proofs with Large Language Models.
☆61Dec 15, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
math-inc / Sphere-Packing-Lean
View on GitHub
A Lean formalisation of Maryna Viazovska's Fields Medal-winning solution to the sphere packing problem in dimension 8 and 24.
☆70Apr 7, 2026Updated 3 months ago
SWE-bench / SWE-bench
View on GitHub
SWE-bench: Can Language Models Resolve Real-world Github Issues?
☆5,467Apr 1, 2026Updated 3 months ago
Goedel-LM / Goedel-Prover-V2
View on GitHub
☆184Aug 27, 2025Updated 10 months ago
CMU-AIRe / QED-Nano
View on GitHub
Training tiny models to prove hard theorems
☆81Mar 5, 2026Updated 4 months ago
ByteDance-Seed / BFS-Prover-V2
View on GitHub
☆50Oct 9, 2025Updated 9 months ago
microsoft / rStar
View on GitHub
☆1,423Sep 12, 2025Updated 10 months ago
huggingface / ioi
View on GitHub
☆42Mar 26, 2025Updated last year