XU-YIJIE / grpo-flat
Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...
☆65Updated last month
Alternatives and similar repositories for grpo-flat:
Users that are interested in grpo-flat are comparing it to the libraries listed below
- 本项目用 于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。☆80Updated 6 months ago
- ☆50Updated 5 months ago
- From Llama to Deepseek, grpo/mtp implemented. With pt/sft/lora/qlora included☆24Updated this week
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆177Updated last month
- 基于DPO算法微调语言大模型,简单好上手。☆33Updated 8 months ago
- llm & rl☆73Updated this week
- The related works and background techniques about Openai o1☆217Updated 2 months ago
- 使用单个24G显卡,从0开始训练LLM☆50Updated 5 months ago
- ☆66Updated last year
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆115Updated last year
- ☆74Updated 4 months ago
- ☆106Updated 8 months ago
- ☆33Updated last month
- pytorch分布式训练☆64Updated last year
- 通义千问的DPO训练☆40Updated 6 months ago
- ☆40Updated 7 months ago
- ☆134Updated 11 months ago
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆91Updated this week
- 一些 LLM 方面的从零复现笔记☆175Updated 6 months ago
- ☆105Updated 4 months ago
- ChatGLM-6B添加了RLHF的实现,以及部分核心代码的逐行讲解 ,实例部分是做了个新闻短标题的生成,以及指定context推荐的RLHF的实现☆82Updated last year
- ☆113Updated 2 months ago
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆62Updated last month
- deepspeed+trainer简单高效实现多卡微调大模型☆123Updated last year
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆152Updated 5 months ago
- NTK scaled version of ALiBi position encoding in Transformer.☆67Updated last year
- ☆97Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆76Updated 4 months ago
- ☆186Updated this week
- 中文 Instruction tuning datasets☆129Updated 11 months ago