alibaba / TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
☆86Updated last year
Related projects: ⓘ
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆126Updated 3 weeks ago
- A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster☆152Updated 4 months ago
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆260Updated last year
- GLake: optimizing GPU memory management and IO transmission.☆351Updated last month
- PyTorch distributed training acceleration framework☆16Updated this week
- ☆133Updated 2 months ago
- ☆205Updated last year
- ☆123Updated 3 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆185Updated last month
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆70Updated last month
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆188Updated 3 weeks ago
- Efficient and easy multi-instance LLM serving☆119Updated this week
- A fast communication-overlapping library for tensor parallelism on GPUs.☆184Updated this week
- Disaggregated serving system for Large Language Models (LLMs).☆278Updated last month
- A baseline repository of Auto-Parallelism in Training Neural Networks☆138Updated 2 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆112Updated 2 years ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆55Updated 4 months ago
- A home for the final text of all TVM RFCs.☆99Updated 3 months ago
- oneflow documentation☆68Updated 2 months ago
- High performance Transformer implementation in C++.☆67Updated this week
- ☆193Updated last year
- A model compilation solution for various hardware☆357Updated this week
- A high-performance distributed deep learning system targeting large-scale and automated distributed training.☆251Updated 9 months ago
- GPU-scheduler-for-deep-learning☆192Updated 3 years ago
- Curated collection of papers in machine learning systems☆123Updated last month
- ☆72Updated last year
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆54Updated 3 months ago
- ☆140Updated 4 months ago
- ☆33Updated 2 weeks ago
- An Efficient Pipelined Data Parallel Approach for Training Large Model☆69Updated 3 years ago