RyanLiu112 / compute-optimal-ttsLinks

Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".

☆273

Alternatives and similar repositories for compute-optimal-tts

Users that are interested in compute-optimal-tts are comparing it to the libraries listed below

Sorting:

Gen-Verse / ReasonFlux
[NeurIPS 2025 Spotlight] ReasonFlux Series - ReasonFlux, ReasonFlux-PRM and ReasonFlux-Coder
☆494Updated last month
knoveleng / open-rs
Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"
☆267Updated 2 weeks ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆259Updated 5 months ago
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆449Updated 5 months ago
ADaM-BJTU / O1-CODER
AN O1 REPLICATION FOR CODING
☆336Updated 10 months ago
InternLM / OREAL
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆190Updated 7 months ago
eddycmu / demystify-long-cot
☆322Updated 5 months ago
ypwang61 / One-Shot-RLVR
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆371Updated 2 weeks ago
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆302Updated last month
RUC-GSAI / YuLan-Mini
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
☆218Updated 3 months ago
ReTool-RL / ReTool
☆225Updated 2 months ago
InternLM / POLAR
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆158Updated last month
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆350Updated 3 weeks ago
PRIME-RL / TTRL
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
☆878Updated last month
ChenxinAn-fdu / POLARIS
Scaling RL on advanced reasoning models
☆620Updated last week
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆191Updated 3 weeks ago
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆366Updated 3 months ago
LeapLabTHU / limit-of-RLVR
repo for paper https://arxiv.org/abs/2504.13837
☆203Updated 4 months ago
GAIR-NLP / ToRL
☆303Updated 5 months ago
ruixin31 / Spurious_Rewards
☆334Updated 3 months ago
SimpleBerry / LLaMA-O1
Large Reasoning Models
☆805Updated 10 months ago
TsinghuaC3I / Unify-Post-Training
Towards a Unified View of Large Language Model Post-Training
☆170Updated last month
OPPO-PersonalAI / Agent_Foundation_Models
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.
☆473Updated last month
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆308Updated last month
facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆248Updated 5 months ago
lzhxmu / CPPO
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)
☆155Updated 2 weeks ago
CSfufu / Revisual-R1
🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal rei…
☆187Updated 2 weeks ago
GAIR-NLP / LIMR
☆211Updated 8 months ago
facebookresearch / RAM
A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).
☆295Updated last week
inclusionAI / ASearcher
An Open-Source Large-Scale Reinforcement Learning Project for Search Agents
☆471Updated 3 weeks ago