owenliang / qwen2.5-0.5b-grpo
Qwen2.5 0.5B GRPO
☆45Updated 3 months ago
Alternatives and similar repositories for qwen2.5-0.5b-grpo
Users that are interested in qwen2.5-0.5b-grpo are comparing it to the libraries listed below
Sorting:
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆56Updated 8 months ago
- llm & rl☆120Updated this week
- 通义千问的DPO训练☆47Updated 7 months ago
- 这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。☆392Updated 2 months ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆194Updated last year
- ☆73Updated 7 months ago
- ☆77Updated 3 months ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆162Updated last year
- 一些 LLM 方面的从零复现笔记☆192Updated 2 weeks ago
- ☆74Updated 9 months ago
- ☆30Updated 9 months ago
- Latest Advances on Long Chain-of-Thought Reasoning☆298Updated last month
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆335Updated 2 months ago
- LLM大模型( 重点)以及搜广推等 AI 算法中手写的面试题,(非 LeetCode),比如 Self-Attention, AUC等,一般比 LeetCode 更考察一个人的综合能力,又更贴近业务和基础知识一点☆257Updated 4 months ago
- ☆154Updated 3 months ago
- 一些大语言模型和多模态模型的应用,主要包括小模型,Agent,跨模态搜索,OCR、RAG、ChatBot等等☆170Updated this week
- ThinkLLM:🚀 轻量、高效的大语言模型算法实现☆53Updated this week
- In this fast-paced world, we all need a little something to spice up life. Whether you need a glass of sweet talk to lift your spirits or…☆54Updated 3 months ago
- ☆108Updated 6 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆125Updated last week
- ☆322Updated 3 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆122Updated last month
- ☆81Updated 3 weeks ago
- 这是一个open-r1的复现项目,对0.5B、1.5B、3B、7B的qwen模型进行GRPO训练,观察到一些有趣的现象。☆24Updated last month
- pytorch复现transformer☆78Updated last year
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆109Updated last week
- 解锁HuggingFace生态的百般用法☆90Updated 5 months ago
- ☆40Updated 2 months ago
- bilibili视频讲解所使用的课件代码记录☆15Updated this week
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆120Updated 6 months ago