FreedomIntelligence / TinyDeepSeekLinks
Reproduction of the complete process of DeepSeek-R1 on small-scale models, including Pre-training, SFT, and RL.
☆29Updated 10 months ago
Alternatives and similar repositories for TinyDeepSeek
Users that are interested in TinyDeepSeek are comparing it to the libraries listed below
Sorting:
- ☆129Updated 8 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆153Updated 7 months ago
- qwen-nsa☆87Updated 3 months ago
- ☆68Updated last year
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆72Updated 10 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆67Updated last year
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆95Updated 3 months ago
- (ICLR 2026) Unveiling Super Experts in Mixture-of-Experts Large Language Models☆35Updated 4 months ago
- ☆230Updated last month
- ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory [COLM2025]☆200Updated 6 months ago
- ☆120Updated this week
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆91Updated 11 months ago
- "what, how, where, and how well? a survey on test-time scaling in large language models" repository☆86Updated this week
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆223Updated 6 months ago
- One-shot Entropy Minimization☆188Updated 7 months ago
- Efficient Mixture of Experts for LLM Paper List☆166Updated 4 months ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆201Updated 2 months ago
- ☆333Updated 8 months ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆283Updated 11 months ago
- ☆55Updated 7 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆289Updated 3 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆143Updated 2 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆57Updated 8 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆413Updated 4 months ago
- Towards a Unified View of Large Language Model Post-Training☆200Updated 5 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆147Updated 10 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆72Updated 10 months ago
- TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)☆429Updated 4 months ago
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆169Updated last week
- ☆209Updated 3 months ago