mkantwala / DeepSeek-R1-TrainingSuiteView on GitHub
Advanced implementation of DeepSeek-R1 featuring Group Relative Policy Optimization (GRPO) for mathematical reasoning AI. Integrates safe distillation, modular reward systems, and efficient LoRA fine-tuning. Open-source Apache 2.0 licensed framework for developing aligned AI systems.
13Jan 29, 2025Updated last year

Alternatives and similar repositories for DeepSeek-R1-TrainingSuite

Users that are interested in DeepSeek-R1-TrainingSuite are comparing it to the libraries listed below

Sorting:

Are these results useful?