preacher-1 / MLA_tutorialLinks
from MHA, MQA, GQA to MLA by 苏剑林, with code
☆29Updated 8 months ago
Alternatives and similar repositories for MLA_tutorial
Users that are interested in MLA_tutorial are comparing it to the libraries listed below
Sorting:
- LLM101n: Let's build a Storyteller 中文版☆133Updated last year
- LLM Tokenizer with BPE algorithm☆41Updated last year
- ☆119Updated last year
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆73Updated last year
- Inference code for LLaMA models☆125Updated 2 years ago
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆159Updated last year
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆178Updated 2 years ago
- pytorch distribute tutorials☆153Updated 4 months ago
- 包含程序员面试大厂面试题和面试经验☆186Updated 5 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆103Updated last week
- 青稞Talk☆150Updated last week
- UltraScale Playbook 中文版☆80Updated 7 months ago
- 通义千问的DPO训练☆56Updated last year
- Implementation of FlashAttention in PyTorch☆171Updated 9 months ago
- ☆147Updated 3 months ago
- 一些大语言模型和多模态模型的生态,主要包括跨模态搜索、投机解码、QAT量化、多模态量化、ChatBot、OCR☆190Updated 2 months ago
- LLM 推理服务性能测试☆43Updated last year
- 欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。☆344Updated last year
- DeepSpeed Tutorial☆102Updated last year
- A light llama-like llm inference framework based on the triton kernel.☆158Updated 3 weeks ago
- 大模型/LLM推理和部署理论与实践☆351Updated 3 months ago
- Qwen2.5 0.5B GRPO☆69Updated 8 months ago
- 将SmolVLM2的视觉头与Qwen3-0.6B模型进行了拼接微调☆393Updated last month
- 看图学大模型☆320Updated last year
- Efficient Mixture of Experts for LLM Paper List☆136Updated 3 weeks ago
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆125Updated 11 months ago
- ☆52Updated 2 years ago
- 通过动画学强化学习笔记☆58Updated 8 months ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆86Updated 6 months ago
- 从零到一实现一个 miniLLM~(动手学习LLM)☆76Updated last year