protagolabs / odyssey-mathLinks

☆85

Alternatives and similar repositories for odyssey-math

Users that are interested in odyssey-math are comparing it to the libraries listed below

Sorting:

mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆124Updated last year
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆90Updated last year
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆143Updated last year
ars22 / scaling-LLM-math-synthetic-data
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
☆31Updated last year
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆169Updated 2 months ago
SynthLabsAI / big-math
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆68Updated 9 months ago
YuxiXie / SelfEval-Guided-Decoding
☆103Updated last year
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆134Updated last year
lmarena / PPE
☆59Updated 6 months ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆76Updated 6 months ago
roeehendel / icl_task_vectors
☆102Updated 2 years ago
Zayne-sprague / MuSR
☆56Updated last year
TIGER-AI-Lab / LongICLBench
Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]
☆110Updated 9 months ago
allenai / Lila
A unified benchmark for math reasoning
☆89Updated 2 years ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆66Updated last year
TIGER-AI-Lab / TheoremQA
The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)
☆37Updated last year
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆129Updated last year
swj0419 / in-context-pretraining
☆54Updated last year
wenhuchen / TheoremQA
The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset
☆160Updated last year
wellecks / lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
☆24Updated last year
da03 / implicit_chain_of_thought
☆139Updated last year
princeton-nlp / USACO
Can Language Models Solve Olympiad Programming?
☆122Updated 10 months ago
ryoungj / ObsScaling
[NeurIPS'24 Spotlight] Observational Scaling Laws
☆59Updated last year
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆149Updated last year
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆179Updated 6 months ago
keirp / OpenWebMath
☆166Updated last year
cyzhh / MMOS
Mix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math…
☆74Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆84Updated last year
HKUNLP / STRING
[ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"
☆78Updated last year