XU-YIJIE / grpo-flatLinks

Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...

☆79

Alternatives and similar repositories for grpo-flat

Users that are interested in grpo-flat are comparing it to the libraries listed below

Sorting:

chunhuizhang / llm_rl
llm & rl
☆246Updated 3 weeks ago
yuanzhoulvpi2017 / nano_rl
在verl上做reward的定制开发
☆128Updated 6 months ago
akaihaoshuai / baby-llama2-chinese_cybertron
使用单个24G显卡，从0开始训练LLM
☆55Updated 4 months ago
percent4 / llm_math_solver
本项目用于大模型数学解题能力方面的数据集合成，模型训练及评测，相关文章记录。
☆97Updated last year
HarderThenHarder / RLLoggingBoard
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
☆265Updated 9 months ago
yuanzhoulvpi2017 / SentenceEmbedding
☆119Updated last year
jackfsuia / nanoRLHF
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
☆74Updated 9 months ago
yanqiangmiffy / how-to-train-tokenizer
怎么训练一个LLM分词器
☆154Updated 2 years ago
suu990901 / LLaMA-MiLe-Loss
Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models
☆65Updated 9 months ago
owenliang / qwen-dpo
通义千问的DPO训练
☆58Updated last year
taishan1994 / pytorch-distributed-NLP
pytorch分布式训练
☆72Updated 2 years ago
hengjiUSTC / learn-llm
☆115Updated last year
RethinkFun / trian_ppo
☆125Updated last year
a-m-team / a-m-models
a-m-team's exploration in large language modeling
☆192Updated 5 months ago
Pillars-Creation / ChatGLM-RLHF-LoRA-RM-PPO
ChatGLM-6B添加了RLHF的实现，以及部分核心代码的逐行讲解 ,实例部分是做了个新闻短标题的生成，以及指定context推荐的RLHF的实现
☆88Updated 2 years ago
Mxoder / LLM-from-scratch
一些 LLM 方面的从零复现笔记
☆236Updated 6 months ago
ADaM-BJTU / OpenRFT
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
☆153Updated 10 months ago
lqtrung1998 / mwp_ReFT
☆549Updated 10 months ago
wjn1996 / Awesome-LLM-Reasoning-Openai-o1-Survey
The related works and background techniques about Openai o1
☆221Updated 10 months ago
CASIA-LM / MoDS
☆146Updated last year
XU-YIJIE / hobo-llm-from-scratch
From Llama to Deepseek, grpo/mtp implemented. With pt/sft/lora/qlora included
☆30Updated 7 months ago
sugarandgugu / Simple-Trl-Training
基于DPO算法微调语言大模型，简单好上手。
☆46Updated last year
l294265421 / alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
☆115Updated 2 years ago
Wangmerlyn / MCTS-GSM8k-Demo
This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems
☆91Updated last week
bytarnish / AGILE
☆162Updated 10 months ago
lansinuote / Simple_LLM_PPO
☆47Updated last year
Qihoo360 / Light-IF
☆39Updated 2 months ago
km1994 / llms_paper
该仓库主要记录 LLMs 算法工程师相关的顶会论文研读笔记（多模态、PEFT、小样本QA问答、RAG、LMMs可解释性、Agents、CoT）
☆367Updated last year
CSHaitao / ChatGLM_mutli_gpu_tuning
deepspeed+trainer简单高效实现多卡微调大模型
☆129Updated 2 years ago
Glanvery / LLM-Travel
欢迎来到 "LLM-travel" 仓库！探索大语言模型（LLM）的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。
☆352Updated last year