FreedomIntelligence / TinyDeepSeekLinks
Reproduction of the complete process of DeepSeek-R1 on small-scale models, including Pre-training, SFT, and RL.
☆26Updated 2 months ago
Alternatives and similar repositories for TinyDeepSeek
Users that are interested in TinyDeepSeek are comparing it to the libraries listed below
Sorting:
- ☆95Updated 2 weeks ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆147Updated 2 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆69Updated 3 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆97Updated 3 months ago
- ☆83Updated last month
- ☆63Updated 6 months ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆69Updated last week
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆73Updated this week
- qwen-nsa☆66Updated last month
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆45Updated 7 months ago
- [arXiv 2025] Efficient Reasoning Models: A Survey☆166Updated last week
- ☆23Updated last week
- ☆16Updated 3 weeks ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆123Updated this week
- A Sober Look at Language Model Reasoning☆63Updated last week
- ☆18Updated last month
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆69Updated last year
- ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory☆95Updated last month
- ☆210Updated last week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆129Updated last month
- ☆105Updated 2 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆212Updated this week
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆120Updated 7 months ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆107Updated this week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆215Updated 3 weeks ago
- ☆131Updated 3 weeks ago
- A version of verl to support tool use☆172Updated this week
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆32Updated 2 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆70Updated 2 months ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆49Updated 3 months ago