XU-YIJIE / hobo-llm-from-scratchLinks

From Llama to Deepseek, grpo/mtp implemented. With pt/sft/lora/qlora included

☆30

Alternatives and similar repositories for hobo-llm-from-scratch

Users that are interested in hobo-llm-from-scratch are comparing it to the libraries listed below

Sorting:

XU-YIJIE / grpo-flat
Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...
☆79Updated 6 months ago
jackfsuia / nanoRLHF
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
☆74Updated 9 months ago
hengjiUSTC / learn-llm
☆115Updated last year
yanqiangmiffy / how-to-train-tokenizer
怎么训练一个LLM分词器
☆154Updated 2 years ago
akaihaoshuai / baby-llama2-chinese_cybertron
使用单个24G显卡，从0开始训练LLM
☆55Updated 4 months ago
SkyworkAI / skywork-o1-prm-inference
☆65Updated 11 months ago
ssbuild / llm_rlhf
realize the reinforcement learning training for gpt2 llama bloom and so on llm model
☆26Updated 2 years ago
HarderThenHarder / RLLoggingBoard
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
☆265Updated 9 months ago
a-m-team / a-m-models
a-m-team's exploration in large language modeling
☆192Updated 5 months ago
CASIA-LM / MoDS
☆146Updated last year
taishan1994 / pytorch-distributed-NLP
pytorch分布式训练
☆72Updated 2 years ago
suu990901 / LLaMA-MiLe-Loss
Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models
☆65Updated 9 months ago
l294265421 / alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
☆115Updated 2 years ago
the-seeds / LLaMA-Factory-Doc
LLaMA Factory Document
☆154Updated 2 weeks ago
iiis-turing-llm / llm-training-calculator
☆53Updated last year
cavalierlulu / rag_survey
☆125Updated last year
OpenBMB / UltraEval
[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.
☆252Updated last year
genggui001 / Megatron-DeepSpeed-Llama
☆84Updated 2 years ago
sugarandgugu / Simple-Trl-Training
基于DPO算法微调语言大模型，简单好上手。
☆46Updated last year
owenliang / qwen-dpo
通义千问的DPO训练
☆58Updated last year
BrendanGraham14 / mcts-llm
☆130Updated last year
nick7nlp / Counting-Stars
Counting-Stars (★)
☆83Updated 5 months ago
multimodal-art-projection / Megatron-LM-NEO
☆40Updated last year
Mxoder / Maxs-Awesome-Datasets
Max的有趣数据集 / Max's awesome datasets
☆52Updated 2 months ago
RUC-GSAI / Llama-3-SynE
Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …
☆34Updated 5 months ago
wjn1996 / Awesome-LLM-Reasoning-Openai-o1-Survey
The related works and background techniques about Openai o1
☆221Updated 10 months ago
percent4 / llm_math_solver
本项目用于大模型数学解题能力方面的数据集合成，模型训练及评测，相关文章记录。
☆97Updated last year
beichao1314 / Open-Llama
The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
☆67Updated 2 years ago
Open-Source-O1 / o1_Reasoning_Patterns_Study
☆104Updated 11 months ago
alibaba / ChatLearn
A flexible and efficient training framework for large-scale alignment tasks
☆439Updated last month