zhaochenyang20 / Awesome-ML-SYS-Tutorial
My learning notes/codes for ML SYS.
β2,005Updated this week
Alternatives and similar repositories for Awesome-ML-SYS-Tutorial:
Users that are interested in Awesome-ML-SYS-Tutorial are comparing it to the libraries listed below
- πA curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.β3,943Updated last week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.β3,184Updated this week
- FlashInfer: Kernel Library for LLM Servingβ2,788Updated this week
- how to optimize some algorithm in cuda.β2,144Updated last week
- πLeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginnersπ, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.π₯β3,896Updated last week
- A self-learning tutorail for CUDA High Performance Programing.β615Updated 3 weeks ago
- LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.β755Updated last week
- Reproduce R1 Zero on Logic Puzzleβ2,327Updated last month
- Large Language Model (LLM) Systems Paper Listβ1,212Updated last week
- A PyTorch Native LLM Training Frameworkβ797Updated 4 months ago
- Official Repo for Open-Reasoner-Zeroβ1,904Updated last month
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.β1,050Updated last week
- Distributed RL System for LLM Reasoningβ1,205Updated last week
- π° Must-read papers and blogs on Speculative Decoding β‘οΈβ714Updated this week
- Redis for LLMsβ951Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernelsβ1,089Updated this week
- [TMLR 2024] Efficient Large Language Models: A Surveyβ1,145Updated last month
- vLLMβs reference system for K8S-native cluster-wide deployment with community-driven performance optimizationβ1,159Updated this week
- Disaggregated serving system for Large Language Models (LLMs).β580Updated last month
- Fast inference from large lauguage models via speculative decodingβ722Updated 8 months ago
- Materials for learning SGLangβ406Updated last week
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)β228Updated 4 months ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.β912Updated 3 weeks ago
- MoBA: Mixture of Block Attention for Long-Context LLMs