preacher-1 / MLA_tutorial
from MHA, MQA, GQA to MLA by 苏剑林, with code
☆16Updated 2 months ago
Alternatives and similar repositories for MLA_tutorial:
Users that are interested in MLA_tutorial are comparing it to the libraries listed below
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆153Updated 6 months ago
- LLM101n: Let's build a Storyteller 中文版☆131Updated 8 months ago
- 从零到一实现一个 miniLLM~(动手学习LLM)☆65Updated 11 months ago
- ☆30Updated 8 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆62Updated last month
- 通义千问的DPO训练☆46Updated 7 months ago
- ☆22Updated last month
- ☆116Updated this week
- Inference code for LLaMA models☆120Updated last year
- ☆68Updated 6 months ago
- ☆107Updated 5 months ago
- ☆79Updated this week
- 使用单个24G显卡,从0开始训练LLM☆53Updated 6 months ago
- 大模型/LLM推理和部署理论与实践☆244Updated last month
- ☆70Updated 2 months ago
- 通过动画学强化学习笔记☆50Updated 2 months ago
- LLM大模型(重点)以及搜广推等 AI 算法中手写的面试题,(非 LeetCode),比如 Self-Attention, AUC等,一般比 LeetCode 更考察一个人的综合能力,又更贴近业务和基础知识一点☆235Updated 3 months ago
- 一些 LLM 方面的从零复现笔记☆183Updated this week
- 大语言模型应用:RAG、NL2SQL、聊天机器人、预训练、MOE混合专家模型、微调训练、强化学习、天池数据竞赛☆60Updated 2 months ago
- pytorch分布式训练☆65Updated last year
- ThinkLLM:🚀 轻量、高效的大语言模型算法实现☆37Updated last week
- 怎么训练一个LLM分词器☆144Updated last year
- 尝试自己从头写一个LLM,参考llama和nanogpt☆58Updated 11 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆54Updated 7 months ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆159Updated last year
- ☆35Updated 3 weeks ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆72Updated last month
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆120Updated 5 months ago
- 解锁HuggingFace生态的百般用法☆89Updated 4 months ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆66Updated last week