mkantwala / DeepSeek-R1-TrainingSuite

Advanced implementation of DeepSeek-R1 featuring Group Relative Policy Optimization (GRPO) for mathematical reasoning AI. Integrates safe distillation, modular reward systems, and efficient LoRA fine-tuning. Open-source Apache 2.0 licensed framework for developing aligned AI systems.
10Updated 2 months ago

Alternatives and similar repositories for DeepSeek-R1-TrainingSuite:

Users that are interested in DeepSeek-R1-TrainingSuite are comparing it to the libraries listed below