AlibabaPAI / torchaccLinks
PyTorch distributed training acceleration framework
☆53Updated 3 months ago
Alternatives and similar repositories for torchacc
Users that are interested in torchacc are comparing it to the libraries listed below
Sorting:
- LLM training technologies developed by kwai☆66Updated this week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆116Updated 6 months ago
- ☆320Updated 2 weeks ago
- ☆102Updated last year
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆97Updated 2 years ago
- A lightweight design for computation-communication overlap.☆188Updated last month
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆276Updated 3 months ago
- Allow torch tensor memory to be released and resumed later☆175Updated 2 weeks ago
- Fast and easy distributed model training examples.☆12Updated last year
- ☆128Updated last week
- ☆152Updated 8 months ago
- ☆97Updated 8 months ago
- nnScaler: Compiling DNN models for Parallel Training☆120Updated 2 months ago
- Zero Bubble Pipeline Parallelism☆437Updated 6 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆80Updated last year
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆179Updated 3 weeks ago
- ☆130Updated 11 months ago
- ☆147Updated 11 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆444Updated 6 months ago
- ☆152Updated 10 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆144Updated 2 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Updated last year
- High performance Transformer implementation in C++.☆142Updated 10 months ago
- GLake: optimizing GPU memory management and IO transmission.☆491Updated 8 months ago
- Pipeline Parallelism Emulation and Visualization☆71Updated 5 months ago
- Efficient and easy multi-instance LLM serving☆512Updated 2 months ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆147Updated 3 years ago
- ☆112Updated 6 months ago
- DeepSeek-V3/R1 inference performance simulator☆168Updated 8 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆169Updated last month