GAIR-NLP/MegaScience

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GAIR-NLP/MegaScience)

GAIR-NLP / MegaScience

[COLM 2026] MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

☆123

Alternatives and similar repositories for MegaScience

Users that are interested in MegaScience are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GAIR-NLP / lm-open-science-evaluation
View on GitHub
Reproducible and flexible LLM evaluations for scientific reasoning.
☆29Jul 23, 2025Updated 11 months ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
GAIR-NLP / cs2916
View on GitHub
☆28Mar 27, 2025Updated last year
GAIR-NLP / Safety-J
View on GitHub
Safety-J: Evaluating Safety with Critique
☆16Jul 28, 2024Updated last year
RZFan525 / Awesome-ScalingLaws
View on GitHub
A curated list of awesome resources dedicated to Scaling Laws for LLMs
☆84Apr 10, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
GAIR-NLP / DataEvolve
View on GitHub
☆31Mar 15, 2026Updated 4 months ago
GAIR-NLP / benbench
View on GitHub
Benchmarking Benchmark Leakage in Large Language Models
☆61May 20, 2024Updated 2 years ago
byronBBL / Context-DPO
View on GitHub
Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"
☆23Feb 17, 2025Updated last year
KehanGuo2 / MolPuzzle
View on GitHub
[NeurIPS 24] Can LLMs Solve Molecule Puzzles? A Multimodal Benchmark for Molecular Structure Elucidation
☆20Jan 2, 2026Updated 6 months ago
Leey21 / CipherBank
View on GitHub
☆13Jun 13, 2025Updated last year
koalazf99 / nanoverl
View on GitHub
Collections of RLxLM experiments using minimal codes
☆14Feb 17, 2025Updated last year
TIGER-AI-Lab / General-Reasoner
View on GitHub
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆227Nov 27, 2025Updated 7 months ago
GAIR-NLP / Data-Darwinism
View on GitHub
[ACL 2026] This is the repo of Data Darwinism.
☆26Apr 16, 2026Updated 3 months ago
NUSTM / LLMs-Waver-In-Judgments
View on GitHub
☆12Sep 23, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Leey21 / A-Data-Centric-Study
View on GitHub
☆18Mar 2, 2026Updated 4 months ago
GAIR-NLP / MoPS
View on GitHub
[ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"
☆46Jul 19, 2024Updated 2 years ago
GAIR-NLP / LIMI
View on GitHub
LIMI: Less is More for Agency
☆162Oct 14, 2025Updated 9 months ago
Andrewzh112 / AI-Research-Interview-Lab
View on GitHub
☆31Nov 14, 2025Updated 8 months ago
GAIR-NLP / MetaCritique
View on GitHub
Evaluate the Quality of Critique
☆37Jun 1, 2024Updated 2 years ago
GAIR-NLP / ProX
View on GitHub
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
☆271Jul 8, 2025Updated last year
GAIR-NLP / ReasonEval
View on GitHub
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆80Oct 9, 2025Updated 9 months ago
LLM360 / MegaMath
View on GitHub
[COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.
☆110Apr 4, 2025Updated last year
GAIR-NLP / alignment-for-honesty
View on GitHub
☆78May 22, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated 11 months ago
Meirtz / BabyBLUE-llm
View on GitHub
[COLING 2025] Official repo of paper: "Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jail…
☆12Jul 26, 2024Updated last year
QizhiPei / BioMatrix
View on GitHub
☆41Jun 23, 2026Updated 3 weeks ago
cholin01 / LaMGen
View on GitHub
☆29Apr 21, 2026Updated 3 months ago
GAIR-NLP / PC-Agent-E
View on GitHub
[ICLR 2026] Efficient Agent Training for Computer Use
☆146Sep 5, 2025Updated 10 months ago
koalazf99 / Awesome-DataCentric-LLM
View on GitHub
Trending projects & awesome papers about data-centric llm studies.
☆40May 20, 2025Updated last year
suu990901 / KlearReasoner
View on GitHub
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆82Dec 25, 2025Updated 6 months ago
ByteDance-Seed / DATAMASK
View on GitHub
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning
☆21Jan 4, 2026Updated 6 months ago
MiniMax-AI / SynLogic
View on GitHub
[NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
☆204Jul 7, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
GAIR-NLP / ReAlign
View on GitHub
Reformatted Alignment
☆111Sep 23, 2024Updated last year
GAIR-NLP / weak-to-strong-reasoning
View on GitHub
☆59Sep 2, 2024Updated last year
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
SinclairCoder / do-research-in-AI
View on GitHub
A repository of useful research/skill-upgrading talks or acticles in NLP/CV/AI Area (in Chinese).
☆96Jul 27, 2024Updated last year
MetaStone-AI / MetaStone-S1
View on GitHub
The open-source code of MetaStone-S1.
☆106Aug 1, 2025Updated 11 months ago
Essential-AI / eai-taxonomy
View on GitHub
☆59Aug 19, 2025Updated 11 months ago
GAIR-NLP / MAYE
View on GitHub
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆149Apr 9, 2025Updated last year