preacher-1 / MLA_tutorialLinks

from MHA, MQA, GQA to MLA by 苏剑林, with code

☆25

Alternatives and similar repositories for MLA_tutorial

Users that are interested in MLA_tutorial are comparing it to the libraries listed below

Sorting:

sunkx109 / llama
Inference code for LLaMA models
☆122Updated last year
SmartFlowAI / LLM101n-CN
LLM101n: Let's build a Storyteller 中文版
☆132Updated 11 months ago
RethinkFun / trian_ppo
☆91Updated 10 months ago
chunhuizhang / pytorch_distribute_tutorials
pytorch distribute tutorials
☆145Updated last month
owenliang / bpe-tokenizer
LLM Tokenizer with BPE algorithm
☆33Updated last year
AI-Study-Han / Zero-Qwen-VL
训练一个对中文支持更好的LLaVA模型，并开源训练代码和数据。
☆64Updated 11 months ago
shreyansh26 / FlashAttention-PyTorch
Implementation of FlashAttention in PyTorch
☆159Updated 6 months ago
datawhalechina / llm-deploy
大模型/LLM推理和部署理论与实践
☆304Updated 3 weeks ago
Mxoder / LLM-from-scratch
一些 LLM 方面的从零复现笔记
☆210Updated 3 months ago
firechecking / CleanTransformer
an implementation of transformer, bert, gpt, and diffusion models for learning purposes
☆155Updated 9 months ago
dhcode-cpp / online-softmax
simplest online-softmax notebook for explain Flash Attention
☆13Updated 7 months ago
mdy666 / mdy_triton
☆140Updated last month
Glanvery / LLM-Travel
欢迎来到 "LLM-travel" 仓库！探索大语言模型（LLM）的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。
☆329Updated last year
akaihaoshuai / baby-llama2-chinese_cybertron
使用单个24G显卡，从0开始训练LLM
☆56Updated 3 weeks ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆144Updated last week
owenliang / qwen-dpo
通义千问的DPO训练
☆51Updated 10 months ago
dhcode-cpp / easy-dualpipe
Pipeline-Parallel Lecture: Simplest Dualpipe Implementation.
☆25Updated last month
liangyuwang / Tiny-DeepSpeed
Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
☆41Updated last week
liujunwen23 / MIRE
WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge
☆122Updated 8 months ago
hengjiUSTC / learn-llm
☆112Updated 8 months ago
lansinuote / Simple_RLHF_Llama3
☆31Updated last year
zhanshijinwat / Steel-LLM
Train a 1B LLM with 1T tokens from scratch by personal
☆707Updated 3 months ago
bbruceyuan / LLMs-101
从零到一实现一个 miniLLM～（动手学习LLM）
☆75Updated last year
FlagAI-Open / OpenSeek
OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next…
☆217Updated last month
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆83Updated 5 months ago
chunhuizhang / bert_t5_gpt
☆73Updated 2 months ago
chaoswork / llm_illustrated
看图学大模型
☆316Updated last year
harleyszhang / llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
☆806Updated last week
bbruceyuan / bit-brain
最少使用 3090 即可训练自己的比特大脑（miniLLM）🧠（进行中）. Train your own BitBrain(A mini LLM) with just an RTX 3090 minimum.
☆32Updated last month
bobo0810 / LearnDeepSpeed
DeepSpeed教程 & 示例注释 & 学习笔记（大模型高效训练）
☆173Updated last year