waylandzhang / DeepSeek-RL-Qwen-0.5B-GRPO-gsm8kLinks
☆83Updated 4 months ago
Alternatives and similar repositories for DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k
Users that are interested in DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k are comparing it to the libraries listed below
Sorting:
- 通义千问的DPO训练☆49Updated 9 months ago
- 一些 LLM 方面的从零复现笔记☆203Updated last month
- llm & rl☆151Updated this week
- ☆85Updated 2 weeks ago
- ☆111Updated 11 months ago
- 使用单个24G显卡,从0开始训练LLM☆55Updated last month
- 对llama3进行全参微调、lora微调以及qlora微调。☆199Updated 8 months ago
- Qwen3 Fine-tuning: Medical R1 Style Chat☆79Updated 3 weeks ago
- ☆79Updated 10 months ago
- ☆109Updated 7 months ago
- 快速入门RAG与私有化部署☆191Updated last year
- 一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。☆217Updated last year
- This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.☆441Updated last month
- 大语言模型应用:RAG、NL2SQL、聊天机器人、预训练、MOE混合专家模型、微调训练、强化学习、天池数据竞赛☆62Updated 4 months ago
- 本项目用于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。