yyDing1/ScaleQuest

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yyDing1/ScaleQuest)

yyDing1 / ScaleQuest

[ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.

☆69

Alternatives and similar repositories for ScaleQuest

Users that are interested in ScaleQuest are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mathllm / MathCoder2
View on GitHub
☆71Oct 16, 2024Updated last year
hkust-nlp / dart-math
View on GitHub
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆120Dec 10, 2024Updated last year
LHL3341 / MetaLadder
View on GitHub
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)
☆12Apr 18, 2025Updated last year
ChengpengLi1003 / DotaMath
View on GitHub
☆30Dec 27, 2024Updated last year
Zhenwen-NLP / MathChat
View on GitHub
Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inte…
☆22Jun 3, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
koalazf99 / nanoverl
View on GitHub
Collections of RLxLM experiments using minimal codes
☆14Feb 17, 2025Updated last year
ChangyuChen347 / MaskedThought
View on GitHub
[ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
☆27Jul 9, 2024Updated 2 years ago
domaineval / DomainEval
View on GitHub
DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …
☆13Dec 12, 2024Updated last year
KbsdJames / Omni-MATH
View on GitHub
The official repository of the Omni-MATH benchmark.
☆94Dec 22, 2024Updated last year
OpenCoder-llm / opc_data_filtering
View on GitHub
Heuristic filtering framework for RefineCode
☆87Mar 13, 2025Updated last year
eddycmu / demystify-long-cot
View on GitHub
☆336May 31, 2025Updated last year
feiyang-k / AutoScale
View on GitHub
Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…
☆14Aug 8, 2025Updated 11 months ago
liushulinle / UloRL
View on GitHub
An Ultra-Long Output Reinforcement Learning Approach
☆23Jul 31, 2025Updated 11 months ago
WindyLee0822 / CTG
View on GitHub
Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)
☆17Dec 8, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Jikai0Wang / OPT-Tree
View on GitHub
☆30May 24, 2025Updated last year
hsajjad / ConceptX
View on GitHub
Analyzing Latent Concept in Pre-trained Transformer Models
☆12Jul 18, 2022Updated 4 years ago
GuanghaoYe / Emergence-of-Thinking
View on GitHub
☆55Feb 11, 2025Updated last year
ElvishElvis / LCA-on-the-line
View on GitHub
LCA-on-the-line (ICML 2024 Oral)
☆14Feb 13, 2025Updated last year
RUCAIBox / JiuZhang3.0
View on GitHub
The code and data for the paper JiuZhang3.0
☆49May 26, 2024Updated 2 years ago
SynthLabsAI / big-math
View on GitHub
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆74Feb 25, 2025Updated last year
linjh1118 / LLM-Research
View on GitHub
A LLM Paper note list.
☆19Apr 6, 2024Updated 2 years ago
yifanzhang-pro / AutoMathText
View on GitHub
[ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts (https://huggingface.co/papers…
☆92Nov 23, 2025Updated 7 months ago
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
GAIR-NLP / ProX
View on GitHub
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
☆270Jul 8, 2025Updated last year
satori-reasoning / Satori-SWE
View on GitHub
☆21May 30, 2025Updated last year
RUCAIBox / OlymMATH
View on GitHub
The OlymMATH dataset
☆24Jun 1, 2025Updated last year
iiis-ai / IterativeQuestionComposing
View on GitHub
[AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)
☆23Oct 2, 2025Updated 9 months ago
illidanlab / ABD
View on GitHub
[ICML2023] Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
☆24Jul 7, 2024Updated 2 years ago
songmzhang / DSKDv2
View on GitHub
The official implementation of the paper "A Dual-Space Framework for General Knowledge Distillation of Large Language Models".
☆18Jan 4, 2026Updated 6 months ago
KodCode-AI / code-r1
View on GitHub
Reproducing R1 for Code with Reliable Rewards
☆13Apr 9, 2025Updated last year
TIGER-AI-Lab / MAmmoTH2
View on GitHub
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆146Oct 27, 2024Updated last year
abdelfattah-lab / SplitReason
View on GitHub
☆20Mar 18, 2026Updated 4 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
LCM-Lab / LOGO
View on GitHub
Code for paper: Long cOntext aliGnment via efficient preference Optimization
☆26Oct 10, 2025Updated 9 months ago
hkust-nlp / B-STaR
View on GitHub
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86May 21, 2025Updated last year
OpenNLG / OpenBA-v2
View on GitHub
OpenBA-V2: 3B LLM (Large Language Model) with T5 architecture, utilizing model pruning technique and continuing pretraining from OpenBA-1…
☆25May 10, 2024Updated 2 years ago
jonathanherzig / semantic-parsing-annotation
View on GitHub
Author implementation of the paper "Don’t paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing"
☆20Oct 5, 2020Updated 5 years ago
wenzhe-li / Self-MoA
View on GitHub
☆17Feb 4, 2025Updated last year
ZubinGou / math-evaluation-harness
View on GitHub
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆277Apr 26, 2024Updated 2 years ago
neulab / data-agora
View on GitHub
[ACL 2025 Main] Official Repository for "Evaluating Language Models as Synthetic Data Generators"
☆41Dec 13, 2024Updated last year