THUDM / T1Links

RL Scaling and Test-Time Scaling (ICML'25)

☆108

Alternatives and similar repositories for T1

Users that are interested in T1 are comparing it to the libraries listed below

Sorting:

GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆137Updated this week
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆102Updated 2 months ago
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆163Updated this week
zitian-gao / SC-MCTS
Interpretable Contrastive Monte Carlo Tree Search Reasoning
☆49Updated 8 months ago
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆108Updated 7 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆74Updated 3 months ago
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆85Updated 6 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆154Updated 3 months ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆138Updated 9 months ago
jwhj / OREO
☆114Updated 5 months ago
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆60Updated 5 months ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆113Updated this week
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains
☆147Updated last month
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆98Updated 4 months ago
SynthLabsAI / big-math
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆58Updated 4 months ago
GAIR-NLP / OlympicArena
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆102Updated 4 months ago
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆82Updated last month
GAIR-NLP / AIME-Preview
☆70Updated 4 months ago
hkust-nlp / PreSelect
[ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches
☆51Updated 4 months ago
icip-cas / Verifier-Engineering
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆60Updated 7 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆125Updated 3 months ago
Open-Source-O1 / o1_Reasoning_Patterns_Study
☆102Updated 7 months ago
yyDing1 / ScaleQuest
[ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆63Updated 8 months ago
zankner / CLoud
Critique-out-Loud Reward Models
☆67Updated 8 months ago
kanishkg / cognitive-behaviors
☆199Updated 3 months ago
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆165Updated last month
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆100Updated last month
tongyx361 / Awesome-LLM4Math
Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…
☆132Updated last year
efficientscaling / Z1
Repo for "Z1: Efficient Test-time Scaling with Code"
☆63Updated 3 months ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆74Updated last month