xiabingquan / distributed_pytorch_from_scratchLinks

PyTorch distributed training from scratch (for educational purposes only)

☆19

Alternatives and similar repositories for distributed_pytorch_from_scratch

Users that are interested in distributed_pytorch_from_scratch are comparing it to the libraries listed below

Sorting:

MLSys-Learner-Resources / Awesome-MLSys-Blogger
The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)
☆302Updated 10 months ago
MoE-Inf / awesome-moe-inference
Curated collection of papers in MoE model inference
☆308Updated last month
LDLINGLINGLING / nano_vllm_note
注释的nano_vllm仓库，并且完成了MiniCPM4的适配以及注册新模型的功能
☆101Updated 3 months ago
InternLM / Awesome-LLM-Training-System
☆44Updated last year
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆138Updated 3 weeks ago
ailzhang / EfficientPyTorch
Code release for book "Efficient Training in PyTorch"
☆112Updated 7 months ago
liangyuwang / Tiny-Megatron
Tiny-Megatron, a minimalistic re-implementation of the Megatron library
☆17Updated 2 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆282Updated 8 months ago
Sunt-ing / stick
A PyTorch-like deep learning framework. Just for fun.
☆156Updated 2 years ago
PKU-SEC-Lab / HybriMoE
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆90Updated 5 months ago
JF-D / Parcae
☆21Updated last year
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆113Updated 4 months ago
mdy666 / mdy_triton
☆149Updated 4 months ago
TreeAI-Lab / Awesome-KV-Cache-Management
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…
☆248Updated 4 months ago
PKUFlyingPig / MIT6.5940_TinyML
Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing
☆60Updated 10 months ago
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆82Updated this week
kwai / Megatron-Kwai
LLM training technologies developed by kwai
☆66Updated this week
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆108Updated 3 weeks ago
Geralt-Targaryen / Awesome-Speculative-Decoding
Reading notes on Speculative Decoding papers
☆17Updated 4 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆142Updated 10 months ago
Zefan-Cai / Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
☆393Updated 8 months ago
HarryWu99 / llm_kvcache_sparsity
Implement some method of LLM KV Cache Sparsity
☆42Updated last year
PKUFlyingPig / CMU10-714
Learning material for CMU10-714: Deep Learning System
☆284Updated last year
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆292Updated 5 months ago
Toseic / LLM-inference-arxiv-daily
🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)
☆12Updated this week
xlite-dev / ffpa-attn
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
☆231Updated last week
sihyeong / Awesome-LLM-Inference-Engine
☆152Updated 5 months ago
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆64Updated last year
YuxueYang1204 / CudaDemo
Implement custom operators in PyTorch with cuda/c++
☆74Updated 2 years ago
SiriusNEO / Triton-Puzzles-Lite
Puzzles for learning Triton, play it with minimal environment configuration!
☆564Updated 2 months ago