cnsdqd-dyb / Guide-GRPO
View external linksLinks

Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.
29Feb 23, 2025Updated 11 months ago

Alternatives and similar repositories for Guide-GRPO

Users that are interested in Guide-GRPO are comparing it to the libraries listed below

Sorting:

Are these results useful?