keirp/OpenWebMath

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/keirp/OpenWebMath)

keirp / OpenWebMath

☆173

Alternatives and similar repositories for OpenWebMath

Users that are interested in OpenWebMath are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yifanzhang-pro / AutoMathText
View on GitHub
[ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts (https://huggingface.co/papers…
☆92Nov 23, 2025Updated 8 months ago
GAIR-NLP / MathPile
View on GitHub
[NeurlPS D&B 2024] Generative AI for Math: MathPile
☆418Apr 4, 2025Updated last year
ChengpengLi1003 / DotaMath
View on GitHub
☆30Dec 27, 2024Updated last year
iiis-ai / IterativeQuestionComposing
View on GitHub
[AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)
☆23Oct 2, 2025Updated 9 months ago
Zhenwen-NLP / MathChat
View on GitHub
Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inte…
☆22Jun 3, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chatnoir-eu / chatnoir-resiliparse
View on GitHub
A robust web archive analytics toolkit
☆144Jul 20, 2026Updated last week
liuchengwucn / FIMO
View on GitHub
☆38Jun 30, 2026Updated 3 weeks ago
conceptmath / conceptmath
View on GitHub
[ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large …
☆26May 29, 2024Updated 2 years ago
hkust-nlp / llm-compression-intelligence
View on GitHub
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆150Sep 20, 2024Updated last year
huggingface / datablations
View on GitHub
Scaling Data-Constrained Language Models
☆345Jun 28, 2025Updated last year
ZubinGou / math-evaluation-harness
View on GitHub
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆277Apr 26, 2024Updated 2 years ago
GAIR-NLP / ReasonEval
View on GitHub
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆80Oct 9, 2025Updated 9 months ago
ChenghaoMou / text-dedup
View on GitHub
All-in-one text de-duplication
☆765Mar 9, 2026Updated 4 months ago
hendrycks / math
View on GitHub
The MATH Dataset (NeurIPS 2021)
☆1,377Sep 6, 2025Updated 10 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
mlfoundations / dclm
View on GitHub
DataComp for Language Models
☆1,455Sep 9, 2025Updated 10 months ago
commoncrawl / ia-web-commons
View on GitHub
Web archiving utility library
☆11Jul 21, 2026Updated last week
LLM360 / MegaMath
View on GitHub
[COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.
☆110Apr 4, 2025Updated last year
whyNLP / Conic10K
View on GitHub
Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.
☆33Dec 6, 2023Updated 2 years ago
microsoft / RedStone
View on GitHub
The RedStone repository includes code for preparing extensive datasets used in training large language models.
☆161Apr 21, 2026Updated 3 months ago
all-the-noises / eval-arena
View on GitHub
☆34Mar 21, 2026Updated 4 months ago
princeton-nlp / QuRating
View on GitHub
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆204Dec 8, 2025Updated 7 months ago
MARIO-Math-Reasoning / Super_MARIO
View on GitHub
☆341Jun 5, 2025Updated last year
allenai / dolma
View on GitHub
Data and tools for generating and inspecting OLMo pre-training data.
☆1,528Nov 5, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
microsoft / rho
View on GitHub
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
☆471Apr 18, 2024Updated 2 years ago
wenhuchen / TheoremQA
View on GitHub
The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset
☆161Apr 23, 2024Updated 2 years ago
GAIR-NLP / benbench
View on GitHub
Benchmarking Benchmark Leakage in Large Language Models
☆61May 20, 2024Updated 2 years ago
yegcjs / mixinglaws
View on GitHub
☆113Jul 15, 2025Updated last year
zhangir-azerbayev / MetaMath
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
rookie-joe / PDA
View on GitHub
☆36Jan 10, 2025Updated last year
EleutherAI / math-lm
View on GitHub
☆1,098Mar 12, 2024Updated 2 years ago
GAIR-NLP / ProX
View on GitHub
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
☆271Jul 8, 2025Updated last year
huggingface / cosmopedia
View on GitHub
☆572Nov 20, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lm-sys / llm-decontaminator
View on GitHub
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆325Dec 20, 2023Updated 2 years ago
mathllm / MathCoder
View on GitHub
[MathCoder, MathCoder-VL] Family of LLMs/LMMs for mathematical reasoning.
☆340Oct 18, 2025Updated 9 months ago
zhaoyu-li / DL4TP
View on GitHub
[COLM 2024] A Survey on Deep Learning for Theorem Proving
☆228May 28, 2025Updated last year
fzyzcjy / ai_math_paper_list
View on GitHub
AI for Mathematics Paper List
☆17Jan 14, 2025Updated last year
roozbeh-mohit / IMO-Steps
View on GitHub
☆31Jul 16, 2025Updated last year
albertqjiang / Portal-to-ISAbelle
View on GitHub
https://albertqjiang.github.io/Portal-to-ISAbelle/
☆58Sep 6, 2023Updated 2 years ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,231Updated this week