xiabingquan / distributed_pytorch_from_scratchLinks
PyTorch distributed training from scratch (for educational purposes only)
☆21Updated 10 months ago
Alternatives and similar repositories for distributed_pytorch_from_scratch
Users that are interested in distributed_pytorch_from_scratch are comparing it to the libraries listed below
Sorting:
- 注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能☆158Updated 6 months ago
- 分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等☆350Updated this week
- Curated collection of papers in MoE model inference☆341Updated 3 months ago
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)☆324Updated last year
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆68Updated last year
- ☆152Updated 7 months ago
- Build LLM from scratch☆85Updated 2 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆115Updated 7 months ago
- ☆47Updated last year
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆147Updated last month
- ☆117Updated last month
- Summary of some awesome work for optimizing LLM inference☆173Updated 2 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆115Updated last month
- Implement some method of LLM KV Cache Sparsity☆41Updated last year
- Triton Documentation in Chinese Simplified / Triton 中文文档☆103Updated last month
- ☆118Updated 4 months ago
- Code release for book "Efficient Training in PyTorch"☆125Updated 10 months ago
- LLM Inference with Deep Learning Accelerator.☆58Updated last year
- Flash Attention from Scratch on CUDA Ampere☆129Updated 5 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆283Updated 11 months ago
- Efficient Mixture of Experts for LLM Paper List☆166Updated 4 months ago
- Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM☆75Updated 6 months ago
- Learning TileLang with 10 puzzles!☆118Updated last week
- A repository sharing the literatures about large language models☆106Updated last month
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆313Updated 8 months ago
- UltraScale Playbook 中文版☆131Updated 10 months ago
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆101Updated last month
- ☆155Updated 11 months ago
- A simple calculation for LLM MFU.☆66Updated 5 months ago
- ☆22Updated last year