owenliang / qwen2.5-0.5b-grpoLinks
Qwen2.5 0.5B GRPO
☆49Updated 3 months ago
Alternatives and similar repositories for qwen2.5-0.5b-grpo
Users that are interested in qwen2.5-0.5b-grpo are comparing it to the libraries listed below
Sorting:
- llm & rl☆134Updated last week
- ☆78Updated 8 months ago
- 通义千问的DPO训练☆48Updated 8 months ago
- ThinkLLM:🚀 轻量、高效的大语言模型算法实现☆67Updated 3 weeks ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆59Updated 9 months ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆198Updated last year
- ☆329Updated 3 months ago
- 这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。☆408Updated 3 months ago
- ☆79Updated 4 months ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆342Updated 3 months ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆164Updated last year
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆136Updated 3 weeks ago
- ☆179Updated last month
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆613Updated 2 weeks ago
- LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation☆122Updated last month
- ☆269Updated last week
- pytorch distribute tutorials☆136Updated last week
- Awesome RL-based LLM Reasoning☆506Updated last month
- 解锁HuggingFace生态的百般用法☆91Updated 5 months ago
- LLM大模型(重点)以及搜广推等 AI 算法中手写的面试题,(非 LeetCode),比如 Self-Attention, AUC等,一般比 LeetCode 更考察一个人的综合能力,又更贴近业务和基础知识一点☆275Updated 5 months ago
- ☆76Updated 9 months ago
- ☆83Updated last month
- bilibili视频讲解所使用的课件代码记录☆16Updated 3 weeks ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆120Updated last week
- 《多模态大模型:新一代人工智能技术范式》作者:刘阳,林倞☆210Updated 6 months ago
- Latest Advances on Long Chain-of-Thought Reasoning☆343Updated last week
- ☆31Updated 9 months ago
- 欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓☆756Updated 2 weeks ago
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆123Updated 6 months ago
- ☆166Updated 9 months ago