openai/grade-school-math

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/openai/grade-school-math)

openai / grade-school-math

☆1,448

Alternatives and similar repositories for grade-school-math

Users that are interested in grade-school-math are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hendrycks / math
View on GitHub
The MATH Dataset (NeurIPS 2021)
☆1,376Sep 6, 2025Updated 10 months ago
hendrycks / test
View on GitHub
Measuring Massive Multitask Language Understanding | ICLR 2021
☆1,601May 28, 2023Updated 3 years ago
arkilpatel / SVAMP
View on GitHub
NAACL 2021: Are NLP Models really able to Solve Simple Math Word Problems?
☆142Jun 30, 2022Updated 4 years ago
openai / prm800k
View on GitHub
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,151Jun 1, 2023Updated 3 years ago
chaochun / nlu-asdiv-dataset
View on GitHub
☆52Jul 4, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
suzgunmirac / BIG-Bench-Hard
View on GitHub
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
☆566Jun 25, 2024Updated 2 years ago
openai / human-eval
View on GitHub
Code for the paper "Evaluating Large Language Models Trained on Code"
☆3,316Jan 17, 2025Updated last year
FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,776Aug 4, 2024Updated last year
google / BIG-bench
View on GitHub
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
☆3,249Jul 19, 2024Updated 2 years ago
OFA-Sys / gsm8k-ScRel
View on GitHub
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆268Sep 12, 2024Updated last year
google-deepmind / AQuA
View on GitHub
A algebraic word problem dataset, with multiple choice questions annotated with rationales.
☆338Nov 2, 2017Updated 8 years ago
ruixiangcui / AGIEval
View on GitHub
☆774Jun 13, 2024Updated 2 years ago
tatsu-lab / alpaca_eval
View on GitHub
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
☆2,006Aug 9, 2025Updated 11 months ago
sroy9 / mawps
View on GitHub
Code for MAWPS: A Math Word Problem Repository
☆41Mar 23, 2023Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
anthropics / hh-rlhf
View on GitHub
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,851Jun 17, 2025Updated last year
allenai / natural-instructions
View on GitHub
Expanding natural instructions
☆1,045Dec 11, 2023Updated 2 years ago
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,906Updated this week
TIGER-AI-Lab / Program-of-Thoughts
View on GitHub
Data and Code for Program of Thoughts [TMLR 2023]
☆317May 15, 2024Updated 2 years ago
Timothyxxx / Chain-of-ThoughtsPapers
View on GitHub
A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".
☆2,105Oct 5, 2023Updated 2 years ago
meta-math / MetaMath
View on GitHub
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
☆455Feb 1, 2024Updated 2 years ago
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,359Jul 13, 2026Updated last week
openai / lm-human-preferences
View on GitHub
Code for the paper Fine-Tuning Language Models from Human Preferences
☆1,393Jul 25, 2023Updated 2 years ago
google-deepmind / mathematics_dataset
View on GitHub
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.
☆1,959Dec 23, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
bigscience-workshop / promptsource
View on GitHub
Toolkit for creating, sharing and using natural language prompts.
☆3,028Oct 23, 2023Updated 2 years ago
yizhongw / self-instruct
View on GitHub
Aligning pretrained language models with instruction data generated by themselves.
☆4,606Mar 27, 2023Updated 3 years ago
lupantech / dl4math
View on GitHub
Resources of deep learning for mathematical reasoning (DL4MATH).
☆374Dec 22, 2023Updated 2 years ago
reasoning-machines / pal
View on GitHub
PaL: Program-Aided Language Models (ICML 2023)
☆524Jun 30, 2023Updated 3 years ago
hkust-nlp / ceval
View on GitHub
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
☆1,862Jul 27, 2025Updated 11 months ago
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,753Jan 8, 2024Updated 2 years ago
kojima-takeshi188 / zero_shot_cot
View on GitHub
Prod Env
☆444Oct 9, 2023Updated 2 years ago
google-research / FLAN
View on GitHub
☆1,565Jul 2, 2026Updated 3 weeks ago
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,834Jul 14, 2026Updated last week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
sylinrl / TruthfulQA
View on GitHub
TruthfulQA: Measuring How Models Imitate Human Falsehoods
☆934Jan 16, 2025Updated last year
OpenLMLab / GAOKAO-Bench
View on GitHub
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.
☆779Jan 7, 2025Updated last year
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,165Updated this week
GAIR-NLP / abel
View on GitHub
SOTA Math Opensource LLM
☆335Dec 12, 2023Updated 2 years ago
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,502Updated this week
eric-mitchell / direct-preference-optimization
View on GitHub
Reference implementation for DPO (Direct Preference Optimization)
☆2,898Aug 11, 2024Updated last year
LYH-YF / MWPToolkit
View on GitHub
MWPToolkit is an open-source framework for math word problem(MWP) solvers.
☆166Sep 28, 2022Updated 3 years ago