owenliang / qwen2.5-0.5b-grpo
Qwen2.5 0.5B GRPO
☆36Updated last month
Alternatives and similar repositories for qwen2.5-0.5b-grpo:
Users that are interested in qwen2.5-0.5b-grpo are comparing it to the libraries listed below
- 通义千问的DPO训练☆41Updated 6 months ago
- llm & rl☆81Updated this week
- Happy experimenting with MLLM and LLM models!☆103Updated 5 months ago
- pytorch复现transformer☆76Updated last year
- 这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。☆321Updated last month
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆54Updated 6 months ago
- ☆77Updated 4 months ago
- ☆75Updated 2 months ago
- 解锁HuggingFace生态的百般用法☆88Updated 3 months ago
- LLM Tokenizer with BPE algorithm☆31Updated 10 months ago
- ☆57Updated 6 months ago
- qwen ai agent☆130Updated last year
- Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖☆34Updated 9 months ago
- 一些大语言模型和多模态模型的应用,主要包括Rag,小模型,Agent,跨模态搜索,OCR等等☆158Updated 4 months ago
- LLM大模型(重点)以及搜广推等 AI 算法中手写的面试题,(非 LeetCode),比如 Self-Attention, AUC等,一般比 LeetCode 更考察一个人的综合能力,又更贴近业务和基础知识一点☆213Updated 3 months ago
- ☆31Updated 7 months ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆175Updated 10 months ago
- ☆298Updated last month
- 通义千问VLLM推理部署DEMO☆556Updated last year
- DeepSpeed Tutorial☆95Updated 7 months ago
- 欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓☆636Updated this week
- 大语言模型应用:RAG、NL2SQL、聊天机器人、预训练、MOE混合专家模型、微调训练、强化学习、天池数据竞赛☆58Updated last month
- ☆85Updated 3 weeks ago
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆119Updated 4 months ago
- 通义千问 SFT试验☆68Updated last year
- ☆30Updated 3 weeks ago
- 从0开始,将chatgpt的技术路线跑一遍。☆226Updated 6 months ago
- In this fast-paced world, we all need a little something to spice up life. Whether you need a glass of sweet talk to lift your spirits or…☆50Updated 2 months ago
- ☆57Updated 7 months ago
- Huggingface transformers的中文文档☆223Updated last year