FreedomIntelligence / TinyDeepSeekLinks

Reproduction of the complete process of DeepSeek-R1 on small-scale models, including Pre-training, SFT, and RL.

☆27

Alternatives and similar repositories for TinyDeepSeek

Users that are interested in TinyDeepSeek are comparing it to the libraries listed below

Sorting:

horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆73Updated 4 months ago
Zanette-Labs / SpeculativeRejection
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆45Updated 8 months ago
OpenSparseLLMs / Linear-MoE
☆104Updated 3 weeks ago
LightChen233 / reasoning-boundary
☆62Updated last week
testtimescaling / testtimescaling.github.io
"what, how, where, and how well? a survey on test-time scaling in large language models" repository
☆45Updated this week
RyanLiu112 / GenPRM
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆75Updated 3 weeks ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆73Updated 3 months ago
hemingkx / TokenSkip
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
☆156Updated 3 months ago
dongguanting / Tool-Star
Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
☆170Updated last week
Blueyee / Efficient-CoT-LRMs
Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!
☆54Updated 2 months ago
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆108Updated 6 months ago
CJReinforce / PURE
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆125Updated last week
Dereck0602 / Awesome_Test_Time_LLMs
☆109Updated 3 months ago
sail-sg / sdft
[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".
☆122Updated 7 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆74Updated last week
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆116Updated 2 months ago
SkyworkAI / skywork-o1-prm-inference
☆63Updated 7 months ago
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆110Updated 4 months ago
fscdc / Awesome-Efficient-Reasoning-Models
[arXiv 2025] Efficient Reasoning Models: A Survey
☆184Updated this week
TIGER-AI-Lab / verl-tool
A version of verl to support tool use
☆261Updated this week
Joshua-Ren / Learning_dynamics_LLM
☆139Updated last month
GAIR-NLP / ToRL
☆228Updated last month
ZhenweiAn / Dynamic_MoE
Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"
☆53Updated 10 months ago
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆84Updated 6 months ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆222Updated last month
maple-research-lab / SLOT
☆85Updated 2 weeks ago
Jikai0Wang / OPT-Tree
☆23Updated last month
abdelfattah-lab / SplitReason
☆16Updated this week
Zanette-Labs / efficient-reasoning
☆65Updated 2 months ago
RUCAIBox / EASYEP
☆19Updated 2 months ago